DePaul University Jarvis College of Computing and Digital Media¶


DSC 478 Final Project¶

Project Title:        Potential Revenue Loss and Gain Prediction for Hotel Reservations
Project Type:       Data Analysis
Team Members:  Bramhashree Raghava Pillai Manoharan,   Hoda Masteri,   Goutham Selvakumar

Outline¶

  • 1 - Data Collection

    • 1.1 Dataset
    • 1.2 Importing the Data
  • 2 - Data Preprocessing

    • 2.1 Summary Statistics
    • 2.2 Data Cleaning
    • 2.3 Data Visualization and Exploration
    • 2.4 Feature Engineering
    • 2.5 Data Visualization for Target Variable
  • 3 - Supervised Knowledge Discovery

    • 3.1 KNN
    • 3.2 Decision Tree
    • 3.3 SVM
    • 3.4 Naive Bayes + MultiNomial
    • 3.5 Linear Discriminant Analysis
    • 3.6 Random Forest
  • 4 - Unsupervised Knowledge Discovery

    • 4.1 Clustering
    • 4.2 Qualitative Analysis of Clusters
  • 5 - Comparison and Conclusion

1 - Data Collection:¶

1.1 - Dataset: Hotel booking demand¶

       Variables and Descriptions:

Hotel:  Resort Hotel or City Hotel
is_canceled:  Indicating if the booking was canceled/no-show (1) or checked-out (0) - This is a compact version of reservation_status attribute
lead_time:  Number of days between the entering date of the booking and the arrival date
arrival_date_year:  Year of arrival date
arrival_date_month:  Month of arrival date
arrival_date_week_number:  Week number of year for arrival date
arrival_date_day_of_month:  Day of arrival date
stays_in_weekend_nights:  Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay
stays_in_week_nights:  Number of week nights (Monday to Friday) the guest stayed or booked to stay
adults:  Number of adults
children:  Number of children
babies:  Number of babies
meal:  Undefined (no meal), SC(no meal), BB(1 meal), HB(2 meals), FB(3 meals)
country:   Country of origin
market_segment:  Market segment designation (abbreviations: TA:Travel Agents; TO:Tour Operators)
distribution_channel:  Booking distribution channel (abbreviations: TA:Travel Agents; TO:Tour Operators; GDS: Global Distribution System)
is_repeated_guest:  Indicates if the booking name was from a repeated guest (1) or not (0)
previous_cancellations:  Number of previous bookings that were cancelled prior to the current booking
previous_bookings_not_canceled:  Number of previous bookings not cancelled prior to the current booking
reserved_room_type:   Code of room type reserved
assigned_room_type:   The assigned room type may differ from the reserved room type due to hotel operation reasons or customer request.
booking_changes  Number of changes made to the booking until the moment of check-in or cancellation
deposit_type:  No Deposit: no pymt; Non Refund: deposit paid covers total stay cost; Refundable: deposit value is under the total cost of stay
agent:  ID of the travel agency that made the booking
company:  ID of the company that made the booking or responsible for paying the booking
days_in_waiting_list:  Number of days the booking was in the waiting list before it was confirmed
customer_type:  Type of booking. Contract:the booking has a contract associated to it. Group: the booking is associated to a group; Transient: the booking is not part of a group or contract, and is not associated to other transient booking; Transient-party: the booking is transient, but is associated to at least other transient booking
adr:  Average Daily Rate (the sum of all lodging transactions by the total number of staying nights)
required_car_parking_spaces:  Number of car parking spaces required by the customer
total_of_special_requests:  Number of special requests made by the customer (e.g. twin bed or high floor)
reservation_status:  Reservation last status: Canceled, Check-Out, No-Show
reservation_status_date:  Date at which the last status was set; Can be used with the Reservation_status to understand when was the booking canceled or when did the customer checkedout

1.2 - Importing the Data:¶

In [1]:
# Importing packages and libraries:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import pylab as pl
In [2]:
# Importing the data:
hotel = pd.read_csv("hotel_bookings.csv")
np.set_printoptions(suppress=True, precision=5)
pd.set_option('display.precision', 5)
print(f"hotel dataframe has {hotel.shape[0]} rows and {hotel.shape[1]} columns.\nThe first 5 rows are:")
hotel.head()
hotel dataframe has 119390 rows and 32 columns.
The first 5 rows are:
Out[2]:
hotel is_canceled lead_time arrival_date_year arrival_date_month arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults ... deposit_type agent company days_in_waiting_list customer_type adr required_car_parking_spaces total_of_special_requests reservation_status reservation_status_date
0 Resort Hotel 0 342 2015 July 27 1 0 0 2 ... No Deposit NaN NaN 0 Transient 0.0 0 0 Check-Out 2015-07-01
1 Resort Hotel 0 737 2015 July 27 1 0 0 2 ... No Deposit NaN NaN 0 Transient 0.0 0 0 Check-Out 2015-07-01
2 Resort Hotel 0 7 2015 July 27 1 0 1 1 ... No Deposit NaN NaN 0 Transient 75.0 0 0 Check-Out 2015-07-02
3 Resort Hotel 0 13 2015 July 27 1 0 1 1 ... No Deposit 304.0 NaN 0 Transient 75.0 0 0 Check-Out 2015-07-02
4 Resort Hotel 0 14 2015 July 27 1 0 2 2 ... No Deposit 240.0 NaN 0 Transient 98.0 0 1 Check-Out 2015-07-03

5 rows × 32 columns

2 - Data Preprocessing:¶

2.1 - Summary Statistics:¶

In [163]:
# Five-number summary and beyond:
hotel.style.set_properties(**{"font-size":"11 px"})
hotel.describe(include = 'all').T
Out[163]:
count unique top freq first last mean std min 25% 50% 75% max
hotel 119206 2 City Hotel 79159 NaT NaT NaN NaN NaN NaN NaN NaN NaN
is_canceled 119206.0 NaN NaN NaN NaT NaT 0.37074 0.48301 0.0 0.0 0.0 1.0 1.0
lead_time 119206.0 NaN NaN NaN NaT NaT 104.11262 106.87564 0.0 18.0 69.0 161.0 737.0
arrival_date_year 119206.0 NaN NaN NaN NaT NaT 2016.15651 0.70747 2015.0 2016.0 2016.0 2017.0 2017.0
arrival_date_month 119206 12 August 13857 NaT NaT NaN NaN NaN NaN NaN NaN NaN
arrival_date_week_number 119206.0 NaN NaN NaN NaT NaT 27.1632 13.6013 1.0 16.0 28.0 38.0 53.0
arrival_date_day_of_month 119206.0 NaN NaN NaN NaT NaT 15.79903 8.78102 1.0 8.0 16.0 23.0 31.0
stays_in_weekend_nights 119206.0 NaN NaN NaN NaT NaT 0.92706 0.99512 0.0 0.0 1.0 2.0 19.0
stays_in_week_nights 119206.0 NaN NaN NaN NaT NaT 2.4992 1.89711 0.0 1.0 2.0 3.0 50.0
adults 119206.0 NaN NaN NaN NaT NaT 1.85919 0.57519 0.0 2.0 2.0 2.0 55.0
children 119206.0 NaN NaN NaN NaT NaT 0.10405 0.39884 0.0 0.0 0.0 0.0 10.0
babies 119206.0 NaN NaN NaN NaT NaT 0.00796 0.09751 0.0 0.0 0.0 0.0 10.0
meal 119206 4 BB 92232 NaT NaT NaN NaN NaN NaN NaN NaN NaN
country 119206 5 listed_other 52563 NaT NaT NaN NaN NaN NaN NaN NaN NaN
market_segment 119206 7 Online TA 56407 NaT NaT NaN NaN NaN NaN NaN NaN NaN
distribution_channel 119206 5 TA/TO 97750 NaT NaT NaN NaN NaN NaN NaN NaN NaN
is_repeated_guest 119206.0 NaN NaN NaN NaT NaT 0.0315 0.17467 0.0 0.0 0.0 0.0 1.0
previous_cancellations 119206.0 NaN NaN NaN NaT NaT 0.08719 0.84493 0.0 0.0 0.0 0.0 26.0
previous_bookings_not_canceled 119206.0 NaN NaN NaN NaT NaT 0.1371 1.49816 0.0 0.0 0.0 0.0 72.0
reserved_room_type 119206 9 A 85873 NaT NaT NaN NaN NaN NaN NaN NaN NaN
assigned_room_type 119206 11 A 74020 NaT NaT NaN NaN NaN NaN NaN NaN NaN
booking_changes 119206.0 NaN NaN NaN NaT NaT 0.21881 0.63851 0.0 0.0 0.0 0.0 18.0
deposit_type 119206 3 No Deposit 104457 NaT NaT NaN NaN NaN NaN NaN NaN NaN
agent 119206 3 listed_other 95741 NaT NaT NaN NaN NaN NaN NaN NaN NaN
days_in_waiting_list 119206.0 NaN NaN NaN NaT NaT 2.32129 17.59829 0.0 0.0 0.0 0.0 391.0
customer_type 119206 4 Transient 89476 NaT NaT NaN NaN NaN NaN NaN NaN NaN
adr 119206.0 NaN NaN NaN NaT NaT 101.97152 50.43287 -6.38 69.5 94.95 126.0 5400.0
required_car_parking_spaces 119206.0 NaN NaN NaN NaT NaT 0.06256 0.24536 0.0 0.0 0.0 0.0 8.0
total_of_special_requests 119206.0 NaN NaN NaN NaT NaT 0.57148 0.79288 0.0 0.0 0.0 1.0 5.0
reservation_status 119206 3 Check-Out 75011 NaT NaT NaN NaN NaN NaN NaN NaN NaN
reservation_status_date 119206 926 2015-10-21 00:00:00 1460 2014-10-17 2017-09-14 NaN NaN NaN NaN NaN NaN NaN
arrival_date 119206 793 2015-12-05 00:00:00 448 2015-07-01 2017-08-31 NaN NaN NaN NaN NaN NaN NaN
Difference 119206.0 NaN NaN NaN NaT NaT 29.72146 70.10175 -69.0 -3.0 -1.0 26.0 526.0

2.2 - Data Cleaning:¶

In [4]:
hotel.info()     # Showing the data types and number of non-null elements for each column
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 119390 entries, 0 to 119389
Data columns (total 32 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   hotel                           119390 non-null  object 
 1   is_canceled                     119390 non-null  int64  
 2   lead_time                       119390 non-null  int64  
 3   arrival_date_year               119390 non-null  int64  
 4   arrival_date_month              119390 non-null  object 
 5   arrival_date_week_number        119390 non-null  int64  
 6   arrival_date_day_of_month       119390 non-null  int64  
 7   stays_in_weekend_nights         119390 non-null  int64  
 8   stays_in_week_nights            119390 non-null  int64  
 9   adults                          119390 non-null  int64  
 10  children                        119386 non-null  float64
 11  babies                          119390 non-null  int64  
 12  meal                            119390 non-null  object 
 13  country                         118902 non-null  object 
 14  market_segment                  119390 non-null  object 
 15  distribution_channel            119390 non-null  object 
 16  is_repeated_guest               119390 non-null  int64  
 17  previous_cancellations          119390 non-null  int64  
 18  previous_bookings_not_canceled  119390 non-null  int64  
 19  reserved_room_type              119390 non-null  object 
 20  assigned_room_type              119390 non-null  object 
 21  booking_changes                 119390 non-null  int64  
 22  deposit_type                    119390 non-null  object 
 23  agent                           103050 non-null  float64
 24  company                         6797 non-null    float64
 25  days_in_waiting_list            119390 non-null  int64  
 26  customer_type                   119390 non-null  object 
 27  adr                             119390 non-null  float64
 28  required_car_parking_spaces     119390 non-null  int64  
 29  total_of_special_requests       119390 non-null  int64  
 30  reservation_status              119390 non-null  object 
 31  reservation_status_date         119390 non-null  object 
dtypes: float64(4), int64(16), object(12)
memory usage: 29.1+ MB
In [5]:
# Showing the columns containing Null values:
null_count = pd.DataFrame(hotel.isnull().sum()[hotel.isnull().sum()!=0], columns =["Null-count"])
print("Columns with Null values\n","-"*22,"\n", null_count)
Columns with Null values
 ---------------------- 
           Null-count
children           4
country          488
agent          16340
company       112593

Nulls in children column: There are only 4 out of 119390 instances that contain Null in the Children column. Dropping the 4 rows seems to be the best action since we can't really infer the number of children from any other variable, and replacing Nulls with mode of that column won't make it an accurate guess either.

In [6]:
hotel.dropna(subset = ["children"], inplace=True, axis = 0)   # Drop 4 rows containing Null in children column

Nulls in country column:

In [7]:
# Showing the cancellation status counts among countries in a pivot table:

new_df = hotel[["country","is_canceled"]]
new_df = np.array(new_df)
dic_canc = {}
dic_notcan = {}

for row in new_df:                  # Iterate over rows of new_df
    if row[1]==1:                   # If canceled:
        if row[0] in dic_canc:           # If country exists in dic_canc:
            dic_canc[row[0]]+=1              # increment the value count
        else:
            dic_canc[row[0]]=1           # Otherwise add the country to the dic_canc
    if row[1]==0:                   # If not canceled:
        if row[0] in dic_notcan:         # If country exists in dic_notcan:
            dic_notcan[row[0]]+=1            # increment the value count
        else:
            dic_notcan[row[0]]=1         # Otherwise add the country to the dic_notcan
            
canceled_df = pd.DataFrame(dic_canc, index = ["is_canceled"]).T
notCanceled_df = pd.DataFrame(dic_notcan, index = ["not_canceled"]).T

pivot_df = pd.concat([canceled_df, notCanceled_df], axis=1)   # concat is outer join by default
pivot_df.fillna(0, inplace=True)      # fill the nulls generated as a result of concat (outer join)
pivot_df = pivot_df.astype(int)
pivot_df["total"] = pivot_df.sum(axis=1)
pivot_df["cancellation_percentage"]= 100 * pivot_df.is_canceled/pivot_df.total
print("\nThe mean cancellation percentage is", pivot_df.cancellation_percentage.mean())

# Any country occuring less than 500 times in the dataset is likely not much reliable to use for any prediction
high_cancel_country = pivot_df.query('total>500 & (cancellation_percentage>50)')
low_cancel_country = pivot_df.query('total>500 & (cancellation_percentage <20)')
low_cancel_country = low_cancel_country.sort_values(by=["total", "cancellation_percentage"], ascending = [False, True])
display(high_cancel_country)
display(low_cancel_country)
The mean cancellation percentage is 30.204589339202563
is_canceled not_canceled total cancellation_percentage
PRT 27515 21071 48586 56.63154
is_canceled not_canceled total cancellation_percentage
FRA 1934 8481 10415 18.56937
DEU 1218 6069 7287 16.71470
NLD 387 1717 2104 18.39354
CN 254 1025 1279 19.85927
AUT 230 1033 1263 18.21061

Country has 488 null values. Replacing them with "unknown" appears to be the best option. In addition, the country code PRT seems to be very distinctive from the rest, because it has higher cancellation rate compared to others. On the other hand, countries like FRA and DEU have significantly lower cancellation rates. Thus, these values should remain as is. The rest can go into a single bin "other". Countries that are not listed frequently really won't help us in predicting anything, because the high or low cancellation rate may be an accident rather than a pattern.

In [8]:
hotel["country"].fillna(value="unknown", inplace=True)      # Filling the 488 NA's with "unknown"
# Saving 'FRA','DEU','PRT', and 'unknown' in a np array to preserve them
pres = np.array(['FRA','DEU','PRT','unknown'])
# Replace any other country not in the pres list, by "listed_other":
hotel['country'] = hotel['country'].apply(lambda i: i if i in pres else 'listed_other')

# Now we only have a few unique values is the country column, shown below with counts:
display(pd.crosstab(hotel["country"], hotel["is_canceled"]))
is_canceled 0 1
country
DEU 6069 1218
FRA 8481 1934
PRT 21071 27515
listed_other 39124 13486
unknown 421 67
In [9]:
print("Normalized value counts (percentages):\n","-"*35)
print(hotel.country.value_counts(normalize=True)*100)  
Normalized value counts (percentages):
 -----------------------------------
listed_other    44.06714
PRT             40.69656
FRA              8.72380
DEU              6.10373
unknown          0.40876
Name: country, dtype: float64

Nulls in agent column:

In [10]:
# Showing the cancellation status counts among agents in a pivot table:

new_df = hotel[["agent","is_canceled"]].copy()
new_df.agent = (new_df.agent.astype('Int64')).astype(str)
new_df = np.array(new_df)

dic_canc = {}
dic_notcan = {}

for row in new_df:                  # Iterate over rows of new_df
    if row[1]==1:                   # If canceled:
        if row[0] in dic_canc:           # If agent exists in dic_canc:
            dic_canc[row[0]]+=1              # increment the value count
        else:
            dic_canc[row[0]]=1           # Otherwise add the agent to the dic_canc
    if row[1]==0:                   # If not canceled:
        if row[0] in dic_notcan:         # If agent exists in dic_notcan:
            dic_notcan[row[0]]+=1            # increment the value count
        else:
            dic_notcan[row[0]]=1         # Otherwise add the agent to the dic_notcan
            
canceled_df = pd.DataFrame(dic_canc, index = ["is_canceled"]).T
notCanceled_df = pd.DataFrame(dic_notcan, index = ["not_canceled"]).T

pivot_df = pd.concat([canceled_df, notCanceled_df], axis=1)   # concat is outer join by default
pivot_df.fillna(0, inplace=True)      # fill the nulls generated as a result of concat (outer join)
pivot_df = pivot_df.astype(int)
pivot_df["total"] = pivot_df.sum(axis=1)
pivot_df["cancellation_percentage"]= 100 * pivot_df.is_canceled/pivot_df.total
print("The mean cancellation percentage is", pivot_df.cancellation_percentage.mean())

# Any agent occuring less than 500 times in the dataset is likely not much reliable to use for prediction
high_cancel_agent = pivot_df.query('total>500 & (cancellation_percentage>50)')
high_cancel_agent = high_cancel_agent.sort_values(by=["total", "cancellation_percentage"], ascending = [False, False])
low_cancel_agent = pivot_df.query('total>500 & (cancellation_percentage <15)')
low_cancel_agent = low_cancel_agent.sort_values(by=["total", "cancellation_percentage"], ascending = [False, True])

display(high_cancel_agent)
display(low_cancel_agent)
The mean cancellation percentage is 25.187683521644328
is_canceled not_canceled total cancellation_percentage
1 5280 1911 7191 73.42511
3 771 565 1336 57.70958
37 717 513 1230 58.29268
19 780 281 1061 73.51555
21 506 369 875 57.82857
229 484 302 786 61.57761
29 546 137 683 79.94143
12 304 274 578 52.59516
20 359 181 540 66.48148
is_canceled not_canceled total cancellation_percentage
7 474 3065 3539 13.39361
241 236 1485 1721 13.71296
28 110 1556 1666 6.60264
40 84 955 1039 8.08470
243 28 486 514 5.44747

agent column has 16340 null values. Replacing them with -1 appears to be the best option (later will convert to string "unknown"). In addition, agent 1 seems to be very distinctive from the rest, because it has higher cancellation rate compared to others. Other agents have significantly lower cancellation rates, more close to the average cancellation rate. Thus, agent 1 should remain as is. The rest can go into a single bin "other". Agents that are not listed frequently really won't help us in predicting anything, because the high or low cancellation rate may be an accident rather than a pattern.

In [11]:
hotel["agent"].fillna(value=-1.0, inplace=True)      # Filling the 16340 NA's with -1.0
hotel["agent"] = hotel["agent"].astype(int)
hotel["agent"] = hotel["agent"].astype(str)
hotel.loc[hotel["agent"] == "-1", "agent"] = "unknown"
# Saving unknown, "1", "unknown" in a np array to preserve them
pres = np.array(["unknown", "1"])
# Replace any other agent not in the pres list, by "listed_other":
hotel['agent'] = hotel['agent'].apply(lambda i: i if i in pres else 'listed_other')

# Now we only have a few unique values is the agent column, shown below with counts:
display(pd.crosstab(hotel["agent"], hotel["is_canceled"]))
is_canceled 0 1
agent
1 1911 5280
listed_other 60945 34912
unknown 12310 4028
In [12]:
print("Normalized value counts (percentages) for unique agents:\n","-"*53)
print(hotel.agent.value_counts(normalize=True)*100)  
Normalized value counts (percentages) for unique agents:
 -----------------------------------------------------
listed_other    80.29166
unknown         13.68502
1                6.02332
Name: agent, dtype: float64

Nulls in company column:

In [13]:
# Showing the cancellation status counts among companies in a pivot table:

new_df = hotel[["company","is_canceled"]].copy()
new_df.company = (new_df.company.astype('Int64')).astype(str)
new_df = np.array(new_df)

dic_canc = {}
dic_notcan = {}

for row in new_df:                  # Iterate over rows of new_df
    if row[1]==1:                   # If canceled:
        if row[0] in dic_canc:           # If company exists in dic_canc:
            dic_canc[row[0]]+=1              # increment the value count
        else:
            dic_canc[row[0]]=1           # Otherwise add the company to the dic_canc
    if row[1]==0:                   # If not canceled:
        if row[0] in dic_notcan:         # If company exists in dic_notcan:
            dic_notcan[row[0]]+=1            # increment the value count
        else:
            dic_notcan[row[0]]=1         # Otherwise add the company to the dic_notcan
            
canceled_df = pd.DataFrame(dic_canc, index = ["is_canceled"]).T
notCanceled_df = pd.DataFrame(dic_notcan, index = ["not_canceled"]).T

pivot_df = pd.concat([canceled_df, notCanceled_df], axis=1)   # concat is outer join by default
pivot_df.fillna(0, inplace=True)      # fill the nulls generated as a result of concat (outer join)
pivot_df = pivot_df.astype(int)
pivot_df["total"] = pivot_df.sum(axis=1)
pivot_df["cancellation_percentage"]= 100 * pivot_df.is_canceled/pivot_df.total
print("The mean cancellation percentage is", pivot_df.cancellation_percentage.mean())

# Any company occuring less than 300 times in the dataset is likely not much reliable to use for prediction
high_cancel_co = pivot_df.query('total>400 & (cancellation_percentage>50)')
high_cancel_co = high_cancel_co.sort_values(by=["total", "cancellation_percentage"], ascending = [False, False])
low_cancel_co = pivot_df.query('total>300 & (cancellation_percentage <5)')
low_cancel_co = low_cancel_co.sort_values(by=["total", "cancellation_percentage"], ascending = [False, True])

display(high_cancel_co)
display(low_cancel_co)
The mean cancellation percentage is 13.41212281412652
is_canceled not_canceled total cancellation_percentage
is_canceled not_canceled total cancellation_percentage

Since no company stands out among others, and since 112593 out of 119390 rows have a null company value, the best action is to drop the company column entirely:

In [14]:
#company has 94% missing values
hotel.drop(columns = 'company', inplace=True)
In [15]:
#After handling missing values of all 4 columns (children, country, agent, company)
# Checking if any columns still containing Null values:
null_count = pd.DataFrame(hotel.isnull().sum()[hotel.isnull().sum()!=0], columns =["Null-count"])
print("Columns with Null values\n","-"*22,"\n", null_count)
Columns with Null values
 ---------------------- 
 Empty DataFrame
Columns: [Null-count]
Index: []
In [16]:
# No Nulls left in the dataframe.

Data quality assessment: Checking the soundness of values, finding mismatched data types, etc.

In [17]:
# meal: Undefined (no meal), SC(no meal), BB(1 meal), HB(2 meals), FB(3 meals)
print("Meal unique values are: " , hotel.meal.unique())

# combining Undefined and SC because they mean the same thing based on the description (=no meal):
hotel.meal.replace("Undefined", "SC", inplace=True)
print("Meal unique values after modification: ", hotel.meal.unique())
Meal unique values are:  ['BB' 'FB' 'HB' 'SC' 'Undefined']
Meal unique values after modification:  ['BB' 'FB' 'HB' 'SC']
In [18]:
hotel.shape   # 4 rows containing null children was dropped earlier 119390 -4 = 119386
Out[18]:
(119386, 31)
In [19]:
# Checking the categorical columns (object type):
hotel[hotel.columns[np.array(hotel.dtypes==object)]].head()
Out[19]:
hotel arrival_date_month meal country market_segment distribution_channel reserved_room_type assigned_room_type deposit_type agent customer_type reservation_status reservation_status_date
0 Resort Hotel July BB PRT Direct Direct C C No Deposit unknown Transient Check-Out 2015-07-01
1 Resort Hotel July BB PRT Direct Direct C C No Deposit unknown Transient Check-Out 2015-07-01
2 Resort Hotel July BB listed_other Direct Direct A C No Deposit unknown Transient Check-Out 2015-07-02
3 Resort Hotel July BB listed_other Corporate Corporate A A No Deposit listed_other Transient Check-Out 2015-07-02
4 Resort Hotel July BB listed_other Online TA TA/TO A A No Deposit listed_other Transient Check-Out 2015-07-03
In [20]:
# Changing the reservation_status_date from object to date:
hotel.reservation_status_date = pd.to_datetime(hotel.reservation_status_date)
In [21]:
# checking the numerical columns:
hotel.select_dtypes(include=['int64', 'float64']).head()
Out[21]:
is_canceled lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest previous_cancellations previous_bookings_not_canceled booking_changes days_in_waiting_list adr required_car_parking_spaces total_of_special_requests
0 0 342 2015 27 1 0 0 2 0.0 0 0 0 0 3 0 0.0 0 0
1 0 737 2015 27 1 0 0 2 0.0 0 0 0 0 4 0 0.0 0 0
2 0 7 2015 27 1 0 1 1 0.0 0 0 0 0 0 0 75.0 0 0
3 0 13 2015 27 1 0 1 1 0.0 0 0 0 0 0 0 75.0 0 0
4 0 14 2015 27 1 0 2 2 0.0 0 0 0 0 0 0 98.0 0 1
In [22]:
# Children column type is float; changing the type to int:
hotel.children = hotel.children.astype(int)
In [23]:
# Ignoring rows where number of guests in total is zero:

baddata = (hotel.children == 0) & (hotel.adults == 0) & (hotel.babies == 0)
print(f"Rows with zero guests: {hotel[baddata].shape[0]}")
hotel = hotel[~baddata]
print(f"Rows left after ignoring rows with zero guests: {hotel.shape[0]}")
Rows with zero guests: 180
Rows left after ignoring rows with zero guests: 119206
In [24]:
# Checking the relation between status and other variables:
pd.set_option('display.max_rows', None)
hotel.groupby("reservation_status").describe().T     
# is_canceled column is similar to reservation_status, but with no-show and cancel cases combined and replaced by 1
Out[24]:
reservation_status Canceled Check-Out No-Show
is_canceled count 42989.00000 75011.00000 1206.00000
mean 1.00000 0.00000 1.00000
std 0.00000 0.00000 0.00000
min 1.00000 0.00000 1.00000
25% 1.00000 0.00000 1.00000
50% 1.00000 0.00000 1.00000
75% 1.00000 0.00000 1.00000
max 1.00000 0.00000 1.00000
lead_time count 42989.00000 75011.00000 1206.00000
mean 147.35858 80.08203 57.22886
std 118.81389 91.13780 66.63515
min 0.00000 0.00000 0.00000
25% 51.00000 9.00000 5.00000
50% 116.00000 45.00000 30.00000
75% 219.00000 124.00000 99.75000
max 629.00000 737.00000 385.00000
arrival_date_year count 42989.00000 75011.00000 1206.00000
mean 2016.17307 2016.14745 2016.13018
std 0.71610 0.70310 0.65600
min 2015.00000 2015.00000 2015.00000
25% 2016.00000 2016.00000 2016.00000
50% 2016.00000 2016.00000 2016.00000
75% 2017.00000 2017.00000 2017.00000
max 2017.00000 2017.00000 2017.00000
arrival_date_week_number count 42989.00000 75011.00000 1206.00000
mean 27.40303 27.07656 24.00332
std 13.01656 13.89714 14.88796
min 1.00000 1.00000 1.00000
25% 17.00000 16.00000 10.00000
50% 28.00000 28.00000 22.00000
75% 38.00000 38.00000 36.00000
max 53.00000 53.00000 53.00000
arrival_date_day_of_month count 42989.00000 75011.00000 1206.00000
mean 15.69606 15.83881 16.99502
std 8.77620 8.77649 9.12704
min 1.00000 1.00000 1.00000
25% 8.00000 8.00000 9.00000
50% 16.00000 16.00000 18.00000
75% 23.00000 23.00000 25.00000
max 31.00000 31.00000 31.00000
stays_in_weekend_nights count 42989.00000 75011.00000 1206.00000
mean 0.92056 0.92806 1.09619
std 1.00014 0.98782 1.22723
min 0.00000 0.00000 0.00000
25% 0.00000 0.00000 0.00000
50% 1.00000 1.00000 1.00000
75% 2.00000 2.00000 2.00000
max 16.00000 19.00000 8.00000
stays_in_week_nights count 42989.00000 75011.00000 1206.00000
mean 2.56091 2.46200 2.61360
std 1.84856 1.90742 2.72288
min 0.00000 0.00000 0.00000
25% 1.00000 1.00000 1.00000
50% 2.00000 2.00000 2.00000
75% 3.00000 3.00000 3.00000
max 40.00000 50.00000 21.00000
adults count 42989.00000 75011.00000 1206.00000
mean 1.90947 1.83352 1.66418
std 0.67866 0.50415 0.55337
min 0.00000 0.00000 0.00000
25% 2.00000 2.00000 1.00000
50% 2.00000 2.00000 2.00000
75% 2.00000 2.00000 2.00000
max 55.00000 4.00000 3.00000
children count 42989.00000 75011.00000 1206.00000
mean 0.10624 0.10256 0.11857
std 0.40879 0.39121 0.49733
min 0.00000 0.00000 0.00000
25% 0.00000 0.00000 0.00000
50% 0.00000 0.00000 0.00000
75% 0.00000 0.00000 0.00000
max 3.00000 3.00000 10.00000
babies count 42989.00000 75011.00000 1206.00000
mean 0.00375 0.01040 0.00663
std 0.06184 0.11312 0.08121
min 0.00000 0.00000 0.00000
25% 0.00000 0.00000 0.00000
50% 0.00000 0.00000 0.00000
75% 0.00000 0.00000 0.00000
max 2.00000 10.00000 1.00000
is_repeated_guest count 42989.00000 75011.00000 1206.00000
mean 0.01175 0.04273 0.03731
std 0.10775 0.20224 0.18961
min 0.00000 0.00000 0.00000
25% 0.00000 0.00000 0.00000
50% 0.00000 0.00000 0.00000
75% 0.00000 0.00000 0.00000
max 1.00000 1.00000 1.00000
previous_cancellations count 42989.00000 75011.00000 1206.00000
mean 0.21412 0.01576 0.00580
std 1.35078 0.27249 0.10370
min 0.00000 0.00000 0.00000
25% 0.00000 0.00000 0.00000
50% 0.00000 0.00000 0.00000
75% 0.00000 0.00000 0.00000
max 26.00000 13.00000 3.00000
previous_bookings_not_canceled count 42989.00000 75011.00000 1206.00000
mean 0.02261 0.20306 0.11526
std 0.64554 1.81202 1.42909
min 0.00000 0.00000 0.00000
25% 0.00000 0.00000 0.00000
50% 0.00000 0.00000 0.00000
75% 0.00000 0.00000 0.00000
max 58.00000 72.00000 44.00000
booking_changes count 42989.00000 75011.00000 1206.00000
mean 0.09456 0.28978 0.23300
std 0.44322 0.71724 0.66040
min 0.00000 0.00000 0.00000
25% 0.00000 0.00000 0.00000
50% 0.00000 0.00000 0.00000
75% 0.00000 0.00000 0.00000
max 16.00000 18.00000 6.00000
days_in_waiting_list count 42989.00000 75011.00000 1206.00000
mean 3.66647 1.58769 0.00000
std 21.78664 14.78122 0.00000
min 0.00000 0.00000 0.00000
25% 0.00000 0.00000 0.00000
50% 0.00000 0.00000 0.00000
75% 0.00000 0.00000 0.00000
max 391.00000 379.00000 0.00000
adr count 42989.00000 75011.00000 1206.00000
mean 105.27101 100.16921 96.45837
std 52.70032 49.07064 44.96386
min 0.00000 -6.38000 0.00000
25% 73.00000 67.76000 69.50000
50% 96.30000 92.70000 86.39500
75% 127.98000 125.00000 118.80000
max 5400.00000 510.00000 328.67000
required_car_parking_spaces count 42989.00000 75011.00000 1206.00000
mean 0.00000 0.09941 0.00000
std 0.00000 0.30333 0.00000
min 0.00000 0.00000 0.00000
25% 0.00000 0.00000 0.00000
50% 0.00000 0.00000 0.00000
75% 0.00000 0.00000 0.00000
max 0.00000 8.00000 0.00000
total_of_special_requests count 42989.00000 75011.00000 1206.00000
mean 0.32311 0.71446 0.53234
std 0.64467 0.83403 0.76665
min 0.00000 0.00000 0.00000
25% 0.00000 0.00000 0.00000
50% 0.00000 1.00000 0.00000
75% 0.00000 1.00000 1.00000
max 5.00000 5.00000 4.00000
In [25]:
# Information inferred from the above table:

# Reservations that get canceled: lead-time is high, previous cancellations are high, higher days in waitlist, etc.
# Reservations where guest stayed: low average of previous cancellations, higher number of special requests, etc.

2.3 - Data Visualization and Exploration:¶

In [26]:
# Reservation status count visualization
status_percentages = hotel["reservation_status"].value_counts(normalize=True)*100
print("Status percentages:","\n","-"*20)
print(status_percentages)
Status percentages: 
 --------------------
Check-Out    62.92552
Canceled     36.06278
No-Show       1.01169
Name: reservation_status, dtype: float64
In [27]:
status_percentages.plot(kind='bar', figsize=(4,3), ylabel="percentage");
In [28]:
# Relation between hotel type and cancellations:
pd.crosstab(hotel.hotel,hotel.is_canceled).plot.bar(figsize=(4,3), ylabel="count");
plt.show()
In [29]:
# Relation between deposit type and cancellations:
pd.crosstab(hotel.deposit_type,hotel.is_canceled).plot.bar(figsize=(4,3), ylabel="count");
plt.show()
In [30]:
# Relation between reservation status (counts of 1's for canceled and counts of 0's for not canceled) and arrival month
hotel.groupby(["arrival_date_month","is_canceled"])["is_canceled"].count().plot(kind='bar',figsize=(5,3));
In [165]:
# Checking if any of the independent numerical variables are highly correlated or are describing the same thing!
corrMatrix = hotel.corr()
corrMatrix.style.background_gradient(cmap='coolwarm')
# Some accidental correlations are observed (like stay in week vs weekend days) but overall, 
# numerical independent variables are not highly correlated.
Out[165]:
  is_canceled lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest previous_cancellations previous_bookings_not_canceled booking_changes days_in_waiting_list adr required_car_parking_spaces total_of_special_requests Difference
is_canceled 1.000000 0.292930 0.016694 0.008299 -0.005902 -0.001316 0.025549 0.058155 0.004862 -0.032566 -0.083740 0.110147 -0.057363 -0.144821 0.054309 0.046558 -0.195696 -0.234925 0.615310
lead_time 0.292930 1.000000 0.040285 0.127060 0.002272 0.085981 0.166892 0.117601 -0.037886 -0.021006 -0.123217 0.086023 -0.073603 0.002219 0.170007 -0.065068 -0.116634 -0.095924 0.560562
arrival_date_year 0.016694 0.040285 1.000000 -0.540378 -0.000179 0.021685 0.031198 0.030305 0.054698 -0.013197 0.010272 -0.119916 0.029231 0.031398 -0.056357 0.198367 -0.013826 0.108663 0.048352
arrival_date_week_number 0.008299 0.127060 -0.540378 1.000000 0.066587 0.018630 0.016048 0.026559 0.005559 0.010418 -0.031123 0.035494 -0.021008 0.006316 0.022679 0.076302 0.001983 0.026192 0.045317
arrival_date_day_of_month -0.005902 0.002272 -0.000179 0.066587 1.000000 -0.016241 -0.028381 -0.001728 0.014541 -0.000238 -0.006478 -0.027031 -0.000309 0.011254 0.022528 0.030234 0.008561 0.003058 -0.016418
stays_in_weekend_nights -0.001316 0.085981 0.021685 0.018630 -0.016241 1.000000 0.494173 0.094777 0.046134 0.018606 -0.086011 -0.012769 -0.042860 0.050190 -0.054401 0.050652 -0.018521 0.073139 -0.049712
stays_in_week_nights 0.025549 0.166892 0.031198 0.016048 -0.028381 0.494173 1.000000 0.096222 0.044651 0.020373 -0.095305 -0.013977 -0.048874 0.080018 -0.002026 0.066828 -0.024934 0.068745 0.016530
adults 0.058155 0.117601 0.030305 0.026559 -0.001728 0.094777 0.096222 1.000000 0.029416 0.017892 -0.140972 -0.007068 -0.108856 -0.041465 -0.008362 0.224302 0.014444 0.123324 0.067498
children 0.004862 -0.037886 0.054698 0.005559 0.014541 0.046134 0.044651 0.029416 1.000000 0.023999 -0.032477 -0.024755 -0.021079 0.050997 -0.033294 0.325058 0.056245 0.081756 -0.018232
babies -0.032566 -0.021006 -0.013197 0.010418 -0.000238 0.018606 0.020373 0.017892 0.023999 1.000000 -0.008813 -0.007509 -0.006552 0.085605 -0.010627 0.029041 0.037389 0.097943 -0.026516
is_repeated_guest -0.083740 -0.123217 0.010272 -0.031123 -0.006478 -0.086011 -0.095305 -0.140972 -0.032477 -0.008813 1.000000 0.082740 0.420642 0.013042 -0.022058 -0.130821 0.077926 0.012968 -0.036032
previous_cancellations 0.110147 0.086023 -0.119916 0.035494 -0.027031 -0.012769 -0.013977 -0.007068 -0.024755 -0.007509 0.082740 1.000000 0.152570 -0.027262 0.005941 -0.065982 -0.018541 -0.048485 0.113764
previous_bookings_not_canceled -0.057363 -0.073603 0.029231 -0.021008 -0.000309 -0.042860 -0.048874 -0.108856 -0.021079 -0.006552 0.420642 0.152570 1.000000 0.011962 -0.009417 -0.072342 0.047506 0.037778 -0.038162
booking_changes -0.144821 0.002219 0.031398 0.006316 0.011254 0.050190 0.080018 -0.041465 0.050997 0.085605 0.013042 -0.027262 0.011962 1.000000 -0.011918 0.026586 0.067487 0.055014 -0.113227
days_in_waiting_list 0.054309 0.170007 -0.056357 0.022679 0.022528 -0.054401 -0.002026 -0.008362 -0.033294 -0.010627 -0.022058 0.005941 -0.009417 -0.011918 1.000000 -0.040867 -0.030603 -0.082752 0.039739
adr 0.046558 -0.065068 0.198367 0.076302 0.030234 0.050652 0.066828 0.224302 0.325058 0.029041 -0.130821 -0.065982 -0.072342 0.026586 -0.040867 1.000000 0.056500 0.172361 -0.024432
required_car_parking_spaces -0.195696 -0.116634 -0.013826 0.001983 0.008561 -0.018521 -0.024934 0.014444 0.056245 0.037389 0.077926 -0.018541 0.047506 0.067487 -0.030603 0.056500 1.000000 0.082726 -0.119616
total_of_special_requests -0.234925 -0.095924 0.108663 0.026192 0.003058 0.073139 0.068745 0.123324 0.081756 0.097943 0.012968 -0.048485 0.037778 0.055014 -0.082752 0.172361 0.082726 1.000000 -0.202507
Difference 0.615310 0.560562 0.048352 0.045317 -0.016418 -0.049712 0.016530 0.067498 -0.018232 -0.026516 -0.036032 0.113764 -0.038162 -0.113227 0.039739 -0.024432 -0.119616 -0.202507 1.000000
In [32]:
corrMatrix["is_canceled"].sort_values(ascending=False)
Out[32]:
is_canceled                       1.00000
lead_time                         0.29293
previous_cancellations            0.11015
adults                            0.05816
days_in_waiting_list              0.05431
adr                               0.04656
stays_in_week_nights              0.02555
arrival_date_year                 0.01669
arrival_date_week_number          0.00830
children                          0.00486
stays_in_weekend_nights          -0.00132
arrival_date_day_of_month        -0.00590
babies                           -0.03257
previous_bookings_not_canceled   -0.05736
is_repeated_guest                -0.08374
booking_changes                  -0.14482
required_car_parking_spaces      -0.19570
total_of_special_requests        -0.23493
Name: is_canceled, dtype: float64
In [33]:
# Variables lead_time and total_of_special_requests have the highest positive and negative 
# correlation with reservation status (is or is not canceled), respectively
In [34]:
# Relation between lead time and cancellations:
hotel.groupby(["is_canceled"])["lead_time"].mean()  # on average, canceled bookings have higher mean lead_time
Out[34]:
is_canceled
0     80.08203
1    144.89911
Name: lead_time, dtype: float64
In [35]:
ct1 = pd.crosstab(hotel.total_of_special_requests, hotel.is_canceled)
ct1.plot(kind="bar", stacked=True, figsize=(5,3))   # The more the # of requests, the less the likelihood of cancellation
Out[35]:
<AxesSubplot:xlabel='total_of_special_requests'>
In [36]:
# Checking the relation between arrival year and cancellations
hotel.groupby(['arrival_date_year', 'is_canceled']).size().to_frame(name='percentage').groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
Out[36]:
percentage
arrival_date_year is_canceled
2015 0 62.95133
1 37.04867
2016 0 64.10646
1 35.89354
2017 0 61.26539
1 38.73461
In [37]:
# Checking the relation between arrival month and cancellations
hotel.groupby(['arrival_date_month', 'is_canceled']).size().to_frame(name='percentage').groupby(level=0).apply(lambda x:100 * x / float(x.sum()))
Out[37]:
percentage
arrival_date_month is_canceled
April 0 59.21647
1 40.78353
August 0 62.23569
1 37.76431
December 0 64.96523
1 35.03477
February 0 66.55489
1 33.44511
January 0 69.49840
1 30.50160
July 0 62.53559
1 37.46441
June 0 58.51405
1 41.48595
March 0 67.77232
1 32.22768
May 0 60.29711
1 39.70289
November 0 68.69000
1 31.31000
October 0 61.90903
1 38.09097
September 0 60.80952
1 39.19048
In [38]:
# Cancellations doesn't seem to be very highly dependent on the arrival year or month.
In [39]:
# Listing the unique values in each categorical column
categorical_hotel = hotel[hotel.columns[np.array(hotel.dtypes==object)]]

for col in categorical_hotel:
  print(f"`{col}` column unique values:\n {hotel[col].unique()}\n")
`hotel` column unique values:
 ['Resort Hotel' 'City Hotel']

`arrival_date_month` column unique values:
 ['July' 'August' 'September' 'October' 'November' 'December' 'January'
 'February' 'March' 'April' 'May' 'June']

`meal` column unique values:
 ['BB' 'FB' 'HB' 'SC']

`country` column unique values:
 ['PRT' 'listed_other' 'FRA' 'unknown' 'DEU']

`market_segment` column unique values:
 ['Direct' 'Corporate' 'Online TA' 'Offline TA/TO' 'Complementary' 'Groups'
 'Aviation']

`distribution_channel` column unique values:
 ['Direct' 'Corporate' 'TA/TO' 'Undefined' 'GDS']

`reserved_room_type` column unique values:
 ['C' 'A' 'D' 'E' 'G' 'F' 'H' 'L' 'B']

`assigned_room_type` column unique values:
 ['C' 'A' 'D' 'E' 'G' 'F' 'I' 'B' 'H' 'L' 'K']

`deposit_type` column unique values:
 ['No Deposit' 'Refundable' 'Non Refund']

`agent` column unique values:
 ['unknown' 'listed_other' '1']

`customer_type` column unique values:
 ['Transient' 'Contract' 'Transient-Party' 'Group']

`reservation_status` column unique values:
 ['Check-Out' 'Canceled' 'No-Show']

In [40]:
hotel.head(3)
Out[40]:
hotel is_canceled lead_time arrival_date_year arrival_date_month arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults ... booking_changes deposit_type agent days_in_waiting_list customer_type adr required_car_parking_spaces total_of_special_requests reservation_status reservation_status_date
0 Resort Hotel 0 342 2015 July 27 1 0 0 2 ... 3 No Deposit unknown 0 Transient 0.0 0 0 Check-Out 2015-07-01
1 Resort Hotel 0 737 2015 July 27 1 0 0 2 ... 4 No Deposit unknown 0 Transient 0.0 0 0 Check-Out 2015-07-01
2 Resort Hotel 0 7 2015 July 27 1 0 1 1 ... 0 No Deposit unknown 0 Transient 75.0 0 0 Check-Out 2015-07-02

3 rows × 31 columns

2.4 - Feature Engineering:¶

In [41]:
# Subtract the reservation_status_date which is the date for most recent status update, from
# the arrival date to better identify whether a canceled case would directly translate to 
# loss of profit or not

# First, it is required to concatenate the arrival columns (year, month, day) and generate an arrival_date column
import calendar
import datetime

arrival_Df = hotel[["arrival_date_year", "arrival_date_month", "arrival_date_day_of_month"]].copy()
arrival_Df.head()
months = list(calendar.month_name)


for month in months:
    arrival_Df.loc[arrival_Df["arrival_date_month"] == month, "arrival_date_month"] = months.index(month)
    
arrival_Df = arrival_Df.astype({'arrival_date_month': 'int64'})
arrival_Df.rename(columns = {'arrival_date_year':'year', 'arrival_date_month':'month', 'arrival_date_day_of_month':'day'}, inplace = True)
arri_date = pd.to_datetime(arrival_Df)
arrival_Df["arrival_date"] = arri_date

display(arrival_Df.head())
arrival_Df.info()
year month day arrival_date
0 2015 7 1 2015-07-01
1 2015 7 1 2015-07-01
2 2015 7 1 2015-07-01
3 2015 7 1 2015-07-01
4 2015 7 1 2015-07-01
<class 'pandas.core.frame.DataFrame'>
Int64Index: 119206 entries, 0 to 119389
Data columns (total 4 columns):
 #   Column        Non-Null Count   Dtype         
---  ------        --------------   -----         
 0   year          119206 non-null  int64         
 1   month         119206 non-null  int64         
 2   day           119206 non-null  int64         
 3   arrival_date  119206 non-null  datetime64[ns]
dtypes: datetime64[ns](1), int64(3)
memory usage: 4.5 MB
In [42]:
# Temporarily adding these new columns to the hotel dataframe:
hotel["arrival_date"] = arri_date
hotel['Difference'] = (hotel['arrival_date'] - hotel['reservation_status_date']).dt.days

# Now, generating a rule for what is profit and what can be a loss of profit, based on the date differences:
# If the reservation is canceled and the deposit is non-refundable or canceled way ahead of arrival (over 200 days),
# then the hotel gains profit by forfeiting the deposit
# If the guest has already checked out, then it's a profit case!
# If it is a no-show and the deposit is non-refundable, then it's a profit case!
profit = hotel.query('(reservation_status=="Canceled" & (deposit_type == "Non Refund" | Difference > 200)) | reservation_status=="Check-Out" | (reservation_status=="No-Show" & deposit_type == "Non Refund")').copy()

# Loss is a collection of instances not captured by the above query
loss = hotel.loc[hotel.index.difference(profit.index)].copy()

# The shapes row numbers add up to the total rows 119,206
print(profit.shape)
print(loss.shape)

display(profit.head())
display(loss.head())
(91487, 33)
(27719, 33)
hotel is_canceled lead_time arrival_date_year arrival_date_month arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults ... agent days_in_waiting_list customer_type adr required_car_parking_spaces total_of_special_requests reservation_status reservation_status_date arrival_date Difference
0 Resort Hotel 0 342 2015 July 27 1 0 0 2 ... unknown 0 Transient 0.0 0 0 Check-Out 2015-07-01 2015-07-01 0
1 Resort Hotel 0 737 2015 July 27 1 0 0 2 ... unknown 0 Transient 0.0 0 0 Check-Out 2015-07-01 2015-07-01 0
2 Resort Hotel 0 7 2015 July 27 1 0 1 1 ... unknown 0 Transient 75.0 0 0 Check-Out 2015-07-02 2015-07-01 -1
3 Resort Hotel 0 13 2015 July 27 1 0 1 1 ... listed_other 0 Transient 75.0 0 0 Check-Out 2015-07-02 2015-07-01 -1
4 Resort Hotel 0 14 2015 July 27 1 0 2 2 ... listed_other 0 Transient 98.0 0 1 Check-Out 2015-07-03 2015-07-01 -2

5 rows × 33 columns

hotel is_canceled lead_time arrival_date_year arrival_date_month arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults ... agent days_in_waiting_list customer_type adr required_car_parking_spaces total_of_special_requests reservation_status reservation_status_date arrival_date Difference
8 Resort Hotel 1 85 2015 July 27 1 0 3 2 ... listed_other 0 Transient 82.0 0 1 Canceled 2015-05-06 2015-07-01 56
9 Resort Hotel 1 75 2015 July 27 1 0 3 2 ... listed_other 0 Transient 105.5 0 0 Canceled 2015-04-22 2015-07-01 70
10 Resort Hotel 1 23 2015 July 27 1 0 4 2 ... listed_other 0 Transient 123.0 0 0 Canceled 2015-06-23 2015-07-01 8
27 Resort Hotel 1 60 2015 July 27 1 2 5 2 ... listed_other 0 Transient 107.0 0 2 Canceled 2015-05-11 2015-07-01 51
32 Resort Hotel 1 96 2015 July 27 1 2 8 2 ... unknown 0 Transient 108.3 0 2 Canceled 2015-05-29 2015-07-01 33

5 rows × 33 columns

In [43]:
# Creating a Revenue column for profit and loss dataframes and populating them:
profit["Revenue"] = 1
loss["Revenue"] = 0
hotel_2 = pd.concat([profit, loss], ignore_index=True)  
hotel_2 = hotel_2.sample(frac=1).reset_index(drop=True)  # Shuffle and reset index so profit and loss rows get shuffled
hotel_2.shape
Out[43]:
(119206, 34)
In [44]:
hotel_2.head(3)
Out[44]:
hotel is_canceled lead_time arrival_date_year arrival_date_month arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults ... days_in_waiting_list customer_type adr required_car_parking_spaces total_of_special_requests reservation_status reservation_status_date arrival_date Difference Revenue
0 City Hotel 0 151 2016 August 32 5 1 2 2 ... 0 Transient 72.25 0 1 Check-Out 2016-08-08 2016-08-05 -3 1
1 Resort Hotel 0 7 2017 June 25 23 0 2 2 ... 0 Transient 128.25 0 0 Check-Out 2017-06-25 2017-06-23 -2 1
2 City Hotel 0 34 2016 February 9 22 1 2 2 ... 0 Transient 77.60 0 2 Check-Out 2016-02-25 2016-02-22 -3 1

3 rows × 34 columns

In [45]:
hotel_2.columns
Out[45]:
Index(['hotel', 'is_canceled', 'lead_time', 'arrival_date_year',
       'arrival_date_month', 'arrival_date_week_number',
       'arrival_date_day_of_month', 'stays_in_weekend_nights',
       'stays_in_week_nights', 'adults', 'children', 'babies', 'meal',
       'country', 'market_segment', 'distribution_channel',
       'is_repeated_guest', 'previous_cancellations',
       'previous_bookings_not_canceled', 'reserved_room_type',
       'assigned_room_type', 'booking_changes', 'deposit_type', 'agent',
       'days_in_waiting_list', 'customer_type', 'adr',
       'required_car_parking_spaces', 'total_of_special_requests',
       'reservation_status', 'reservation_status_date', 'arrival_date',
       'Difference', 'Revenue'],
      dtype='object')

I can now safely drop the columns:¶

is_canceled: because we have a new target 'Revenue' which is derived from the reservation status itself
reservation_status_date: We used this date to engineer the target column, Revenue
arrival_date: This was added temporarily to engineer the target column, Revenue
Difference: This was added temporarily to see the difference between the arrival time and last status update date
reservation_status: We used this to engineer the target column, Revenue

In [46]:
# drop these columns to avoid information leakage becase the target is created using these columns:
hotel_2.drop(columns = ['is_canceled', 'reservation_status_date', 'arrival_date', 'Difference', 'reservation_status'], inplace=True)
hotel_2.columns
Out[46]:
Index(['hotel', 'lead_time', 'arrival_date_year', 'arrival_date_month',
       'arrival_date_week_number', 'arrival_date_day_of_month',
       'stays_in_weekend_nights', 'stays_in_week_nights', 'adults', 'children',
       'babies', 'meal', 'country', 'market_segment', 'distribution_channel',
       'is_repeated_guest', 'previous_cancellations',
       'previous_bookings_not_canceled', 'reserved_room_type',
       'assigned_room_type', 'booking_changes', 'deposit_type', 'agent',
       'days_in_waiting_list', 'customer_type', 'adr',
       'required_car_parking_spaces', 'total_of_special_requests', 'Revenue'],
      dtype='object')
In [47]:
hotel_2.head(3)
Out[47]:
hotel lead_time arrival_date_year arrival_date_month arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children ... assigned_room_type booking_changes deposit_type agent days_in_waiting_list customer_type adr required_car_parking_spaces total_of_special_requests Revenue
0 City Hotel 151 2016 August 32 5 1 2 2 0 ... A 0 No Deposit listed_other 0 Transient 72.25 0 1 1
1 Resort Hotel 7 2017 June 25 23 0 2 2 0 ... A 0 No Deposit listed_other 0 Transient 128.25 0 0 1
2 City Hotel 34 2016 February 9 22 1 2 2 1 ... D 0 No Deposit listed_other 0 Transient 77.60 0 2 1

3 rows × 29 columns

2.5 Data Visualization for Target Variable¶

Now, our new target variable is Revenue.

In [48]:
revenue_percentages = hotel_2["Revenue"].value_counts(normalize=True)*100
print(revenue_percentages)
revenue_percentages.plot(kind='bar', figsize=(4,3), ylabel="percentage");
1    76.74698
0    23.25302
Name: Revenue, dtype: float64
In [49]:
# Create dummy variables
hotel_ssf = pd.get_dummies(hotel_2)
print(f"There are now {hotel_ssf.shape[1]} columns in hotel_ssf dataframe.")
There are now 83 columns in hotel_ssf dataframe.
In [50]:
# Among the important independent variables that are correlated to Revenue are:
In [51]:
corrMatrix = hotel_ssf.corr()
corrMatrix["Revenue"].sort_values(ascending=False)[:5]
Out[51]:
Revenue                         1.00000
deposit_type_Non Refund         0.20554
required_car_parking_spaces     0.14034
market_segment_Offline TA/TO    0.11960
market_segment_Groups           0.10615
Name: Revenue, dtype: float64
In [52]:
corrMatrix["Revenue"].sort_values(ascending=False)[-5:]
Out[52]:
distribution_channel_TA/TO   -0.09863
adr                          -0.12158
agent_listed_other           -0.12832
deposit_type_No Deposit      -0.20466
market_segment_Online TA     -0.24340
Name: Revenue, dtype: float64
In [53]:
pd.crosstab(hotel_ssf["deposit_type_Non Refund"],hotel_ssf.Revenue).plot.bar(figsize=(4,3), ylabel="count");
plt.show()
In [54]:
pd.crosstab(hotel_ssf["deposit_type_No Deposit"],hotel_ssf.Revenue).plot.bar(figsize=(4,3), ylabel="count");
plt.show()
In [55]:
pd.crosstab(hotel_ssf["market_segment_Online TA"],hotel_ssf.Revenue).plot.bar(figsize=(4,3), ylabel="count");
plt.show()

Preparing the data and labels for classification¶

In [57]:
# Data:    << hotel_ssf>>    is all numeric (includes dummy variables)
# Recall that hotel_2 still includes categorical variables that are not converted to dummies

# separate the Revenue column to be used as the target:
y = hotel_ssf["Revenue"]

X_ssf = hotel_ssf.drop(["Revenue"], axis=1)
In [58]:
# Creating a reduced version of the matrix X and label y to test run the classifiers easily

hotel_ssf_redu = hotel_ssf.sample(frac=0.12).reset_index(drop=True) #using only a fraction of rows
y = hotel_ssf_redu["Revenue"]
X_ssf = hotel_ssf_redu.drop(["Revenue"], axis=1)
In [59]:
revenue_percentages = hotel_ssf_redu["Revenue"].value_counts(normalize=True)*100
print(revenue_percentages)
revenue_percentages.plot(kind='bar', figsize=(4,3), ylabel="percentage");
1    76.28102
0    23.71898
Name: Revenue, dtype: float64

3 - Supervised Knowledge Discovery¶

3.1 KNN¶

First split, then train the minmax scalar on train set only; then use it to transform test:¶

In [60]:
from sklearn.model_selection import train_test_split
#Using 80/20 split
train, test, target_train, target_test = train_test_split(X_ssf, y, test_size=0.2, random_state=33) 

from sklearn import preprocessing
#Normalizing the data
min_max_scaler = preprocessing.MinMaxScaler().fit(train)
#Min-Max normalization to scale all the variables between 0 & 1
train_norm = min_max_scaler.transform(train)
train_norm = pd.DataFrame(train_norm, columns=train.columns, index=train.index)

test_norm = min_max_scaler.transform(test)
test_norm = pd.DataFrame(test_norm, columns=test.columns, index=test.index)

target = y

train = train_norm
test= test_norm
In [61]:
train.shape
Out[61]:
(11444, 82)
In [62]:
test.shape
Out[62]:
(2861, 82)
In [63]:
from sklearn.model_selection import GridSearchCV
from sklearn import neighbors
#Intializing KNN classifier
clf = neighbors.KNeighborsClassifier()    
#Parameters for grid search (values of k=1, k=3, k=5, k=7, k=9)
parameters = {'n_neighbors': [1,3,5,7,9]}          
#Initializing the grid search with specified parameters & 5-fold cross-validation
gs = GridSearchCV(clf, parameters, verbose=1, cv=5)   
In [64]:
tar_train = target_train.astype('int32')
In [65]:
tar_test = target_test.astype('int32')
In [66]:
#Performing the grid search on the training data
gs.fit(train, tar_train)
Fitting 5 folds for each of 5 candidates, totalling 25 fits
Out[66]:
GridSearchCV(cv=5, estimator=KNeighborsClassifier(),
             param_grid={'n_neighbors': [1, 3, 5, 7, 9]}, verbose=1)
In [67]:
for (i, j) in gs.best_params_.items():
    print ("The optimal value of", i, "is:", j)
print()
print("The best cross-validation accuracy on the training data was: {}".format(gs.best_score_))
The optimal value of n_neighbors is: 7

The best cross-validation accuracy on the training data was: 0.7637194609671674
In [124]:
from sklearn.metrics import classification_report
#Initializing with k=7
clf = neighbors.KNeighborsClassifier(n_neighbors=7)  
#Fitting the training data 
clf.fit(train, tar_train) 
#Predict classes of test set
pred_test_target = clf.predict(test)   
#Print classification report
print(classification_report(tar_test, pred_test_target))  
              precision    recall  f1-score   support

           0       0.55      0.42      0.48       702
           1       0.82      0.89      0.85      2159

    accuracy                           0.77      2861
   macro avg       0.69      0.65      0.66      2861
weighted avg       0.76      0.77      0.76      2861

In [145]:
from sklearn.model_selection import cross_val_score

clf = neighbors.KNeighborsClassifier(n_neighbors=3)
cv_scores = cross_val_score(clf4, train, tar_train, cv=10)

print(f"\nCross validation scores:\n{cv_scores}")
print("\n Overall Accuracy on X-Val: %0.2f (+/- %0.2f)" % (cv_scores.mean(), cv_scores.std() * 2))

clf.fit(train, tar_train)
print("\n Accuracy on Training: ",  clf4.score(train, tar_train))

clf.fit(test, tar_test)
print("\n Accuracy on Testing: ",  clf4.score(test, tar_test))
Cross validation scores:
[0.76157 0.78865 0.78253 0.79127 0.77797 0.78147 0.80332 0.77448 0.79458
 0.79808]

 Overall Accuracy on X-Val: 0.79 (+/- 0.02)

 Accuracy on Training:  0.932278923453338

 Accuracy on Testing:  0.787836420831877

3.2 Decision Trees¶

In [69]:
# Splitting the data into train(80%) and test(20%) sets:
from sklearn.model_selection import train_test_split

train1, test1, target_train1, target_test1 = train_test_split(X_ssf, y, test_size=0.2, random_state=33)
In [70]:
from sklearn import tree
treeclf = tree.DecisionTreeClassifier(criterion='entropy', min_samples_split=3)
In [71]:
treeclf = treeclf.fit(train1, target_train1)
In [72]:
treepreds_test = treeclf.predict(test1)
print(treepreds_test)
[0 0 1 ... 1 1 1]
In [73]:
from sklearn.metrics import classification_report, accuracy_score, recall_score, precision_score, confusion_matrix, roc_curve, roc_auc_score,plot_confusion_matrix


print(classification_report(target_test1, treepreds_test))
              precision    recall  f1-score   support

           0       0.59      0.62      0.60       702
           1       0.87      0.86      0.87      2159

    accuracy                           0.80      2861
   macro avg       0.73      0.74      0.74      2861
weighted avg       0.80      0.80      0.80      2861

In [74]:
treecm = confusion_matrix(target_test1, treepreds_test)
print(treecm)
[[ 432  270]
 [ 298 1861]]
In [75]:
import pylab as plt
%matplotlib inline
plt.matshow(treecm)
plt.title('Confusion matrix')
plt.colorbar()
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
In [76]:
pr = treeclf.predict_proba(test1)
In [77]:
import sklearn.metrics as metrics
prd = pr[:,1]
FalsePR, TruePR, threshold = metrics.roc_curve(target_test1, prd)
auc = metrics.auc(FalsePR, TruePR)
In [78]:
plt.title('ROC Curve')
plt.plot(FalsePR, TruePR, 'b', label = 'AUC = %0.2f' % auc)
plt.legend(loc = 'lower right')
plt.plot([0, 1], [0, 1],'r--')
plt.ylabel('TPR')
plt.xlabel('FPR')
plt.show()
In [79]:
treepreds_train = treeclf.predict(train1)
print(treepreds_train)
[1 1 0 ... 1 1 1]
In [80]:
print(classification_report(target_train1, treepreds_train))
              precision    recall  f1-score   support

           0       0.97      1.00      0.98      2691
           1       1.00      0.99      1.00      8753

    accuracy                           0.99     11444
   macro avg       0.99      0.99      0.99     11444
weighted avg       0.99      0.99      0.99     11444

In [81]:
treecm1 = confusion_matrix(target_train1, treepreds_train)
print(treecm1)
[[2685    6]
 [  76 8677]]
In [82]:
print (treeclf.score(train1, target_train1))
0.9928346731911919
In [83]:
print (treeclf.score(test1, target_test1))
0.8014680181754631
In [84]:
#Grid Search
from sklearn.model_selection import GridSearchCV

dt = tree.DecisionTreeClassifier()

parameters = {
    'criterion': ['entropy','gini'],
    'max_depth': np.linspace(1, 20, 10, dtype=int),
    'min_samples_leaf': np.linspace(1, 30, 15, dtype=int),
    'min_samples_split': np.linspace(2, 20, 10, dtype=int)
}

gs = GridSearchCV(dt, parameters, verbose=1, cv=5)
In [85]:
%time _ = gs.fit(train1, target_train1)

gs.best_params_, gs.best_score_
Fitting 5 folds for each of 3000 candidates, totalling 15000 fits
CPU times: user 8min 5s, sys: 1.19 s, total: 8min 6s
Wall time: 8min 7s
Out[85]:
({'criterion': 'entropy',
  'max_depth': 11,
  'min_samples_leaf': 17,
  'min_samples_split': 4},
 0.8218290501547383)
In [87]:
from sklearn import metrics

def measure_performance(X, y, clf, show_accuracy=True, show_classification_report=True, show_confussion_matrix=True):
    y_pred = clf.predict(X)   
    if show_accuracy:
         print("Accuracy:{0:.3f}".format(metrics.accuracy_score(y, y_pred)),"\n")
    if show_classification_report:
        print("Classification report")
        print(metrics.classification_report(y, y_pred),"\n")
      
    if show_confussion_matrix:
        print("Confussion matrix")
        print(metrics.confusion_matrix(y, y_pred),"\n")
In [88]:
dtGS = tree.DecisionTreeClassifier(criterion='entropy', max_depth=11, min_samples_leaf=17, min_samples_split=4)

dtGS.fit(train1, target_train1)
measure_performance(test1, target_test1, dtGS, show_confussion_matrix=False, show_classification_report=True)
Accuracy:0.810 

Classification report
              precision    recall  f1-score   support

           0       0.63      0.54      0.58       702
           1       0.86      0.90      0.88      2159

    accuracy                           0.81      2861
   macro avg       0.74      0.72      0.73      2861
weighted avg       0.80      0.81      0.80      2861
 

In [146]:
from sklearn.model_selection import cross_val_score

treeclf = tree.DecisionTreeClassifier(criterion='entropy', max_depth=11, min_samples_leaf=17, min_samples_split=4)
cv_scores = cross_val_score(treeclf, train1, target_train1, cv=10)

print(f"\nCross validation scores:\n{cv_scores}")
print("\n Overall Accuracy on X-Val: %0.2f (+/- %0.2f)" % (cv_scores.mean(), cv_scores.std() * 2))

treeclf = treeclf.fit(train1, target_train1)
print("\n Accuracy on Training: ",  treeclf.score(train1, target_train1))

treeclf = treeclf.fit(test1, target_test1)
print("\n Accuracy on Testing: ",  treeclf.score(test1, target_test1))
Cross validation scores:
[0.81048 0.82707 0.81135 0.80961 0.81294 0.81469 0.81031 0.82168 0.8278
 0.83129]

 Overall Accuracy on X-Val: 0.82 (+/- 0.02)

 Accuracy on Training:  0.8405277874868927

 Accuracy on Testing:  0.8276826284515904
In [89]:
from sklearn.tree import export_graphviz
export_graphviz(treeclf,out_file='tree.dot', feature_names=train1.columns)
In [164]:
import graphviz

with open("tree.dot") as f:
    dot_graph = f.read()
graphviz.Source(dot_graph, format = "png")
Out[164]:
Tree 0 deposit_type_Non Refund <= 0.5 entropy = 0.787 samples = 11444 value = [2691, 8753] 1 required_car_parking_spaces <= 0.5 entropy = 0.837 samples = 10073 value = [2691, 7382] 0->1 True 2860 entropy = 0.0 samples = 1371 value = [0, 1371] 0->2860 False 2 market_segment_Online TA <= 0.5 entropy = 0.866 samples = 9351 value = [2691, 6660] 1->2 2859 entropy = 0.0 samples = 722 value = [0, 722] 1->2859 3 country_PRT <= 0.5 entropy = 0.71 samples = 4276 value = [830, 3446] 2->3 916 total_of_special_requests <= 0.5 entropy = 0.948 samples = 5075 value = [1861, 3214] 2->916 4 distribution_channel_TA/TO <= 0.5 entropy = 0.277 samples = 2321 value = [111, 2210] 3->4 207 lead_time <= 24.5 entropy = 0.949 samples = 1955 value = [719, 1236] 3->207 5 agent_unknown <= 0.5 entropy = 0.526 samples = 791 value = [94, 697] 4->5 180 previous_cancellations <= 0.5 entropy = 0.088 samples = 1530 value = [17, 1513] 4->180 6 arrival_date_year <= 2015.5 entropy = 0.696 samples = 422 value = [79, 343] 5->6 141 country_unknown <= 0.5 entropy = 0.245 samples = 369 value = [15, 354] 5->141 7 entropy = 0.0 samples = 43 value = [0, 43] 6->7 8 stays_in_week_nights <= 2.5 entropy = 0.738 samples = 379 value = [79, 300] 6->8 9 adr <= 95.3 entropy = 0.563 samples = 197 value = [26, 171] 8->9 64 lead_time <= 156.0 entropy = 0.87 samples = 182 value = [53, 129] 8->64 10 arrival_date_week_number <= 8.0 entropy = 0.191 samples = 68 value = [2, 66] 9->10 21 arrival_date_day_of_month <= 15.5 entropy = 0.693 samples = 129 value = [24, 105] 9->21 11 adr <= 68.53 entropy = 0.469 samples = 20 value = [2, 18] 10->11 20 entropy = 0.0 samples = 48 value = [0, 48] 10->20 12 entropy = 0.0 samples = 11 value = [0, 11] 11->12 13 adr <= 79.6 entropy = 0.764 samples = 9 value = [2, 7] 11->13 14 arrival_date_day_of_month <= 13.5 entropy = 0.971 samples = 5 value = [2, 3] 13->14 19 entropy = 0.0 samples = 4 value = [0, 4] 13->19 15 entropy = 0.0 samples = 2 value = [0, 2] 14->15 16 country_listed_other <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 14->16 17 entropy = 0.0 samples = 1 value = [1, 0] 16->17 18 entropy = 1.0 samples = 2 value = [1, 1] 16->18 22 stays_in_weekend_nights <= 0.5 entropy = 0.404 samples = 62 value = [5, 57] 21->22 35 arrival_date_week_number <= 33.5 entropy = 0.86 samples = 67 value = [19, 48] 21->35 23 entropy = 0.0 samples = 25 value = [0, 25] 22->23 24 adr <= 116.55 entropy = 0.571 samples = 37 value = [5, 32] 22->24 25 entropy = 0.0 samples = 16 value = [0, 16] 24->25 26 arrival_date_day_of_month <= 11.5 entropy = 0.792 samples = 21 value = [5, 16] 24->26 27 children <= 0.5 entropy = 0.918 samples = 15 value = [5, 10] 26->27 34 entropy = 0.0 samples = 6 value = [0, 6] 26->34 28 adr <= 155.8 entropy = 1.0 samples = 10 value = [5, 5] 27->28 33 entropy = 0.0 samples = 5 value = [0, 5] 27->33 29 adr <= 124.5 entropy = 0.863 samples = 7 value = [2, 5] 28->29 32 entropy = 0.0 samples = 3 value = [3, 0] 28->32 30 entropy = 0.0 samples = 2 value = [2, 0] 29->30 31 entropy = 0.0 samples = 5 value = [0, 5] 29->31 36 booking_changes <= 0.5 entropy = 0.706 samples = 52 value = [10, 42] 35->36 57 adr <= 163.665 entropy = 0.971 samples = 15 value = [9, 6] 35->57 37 reserved_room_type_D <= 0.5 entropy = 0.811 samples = 40 value = [10, 30] 36->37 56 entropy = 0.0 samples = 12 value = [0, 12] 36->56 38 assigned_room_type_E <= 0.5 entropy = 0.907 samples = 31 value = [10, 21] 37->38 55 entropy = 0.0 samples = 9 value = [0, 9] 37->55 39 adr <= 147.0 entropy = 0.826 samples = 27 value = [7, 20] 38->39 52 distribution_channel_Direct <= 0.5 entropy = 0.811 samples = 4 value = [3, 1] 38->52 40 arrival_date_week_number <= 22.0 entropy = 0.977 samples = 17 value = [7, 10] 39->40 51 entropy = 0.0 samples = 10 value = [0, 10] 39->51 41 lead_time <= 182.5 entropy = 0.503 samples = 9 value = [1, 8] 40->41 44 adr <= 110.415 entropy = 0.811 samples = 8 value = [6, 2] 40->44 42 entropy = 0.0 samples = 8 value = [0, 8] 41->42 43 entropy = 0.0 samples = 1 value = [1, 0] 41->43 45 meal_SC <= 0.5 entropy = 1.0 samples = 4 value = [2, 2] 44->45 50 entropy = 0.0 samples = 4 value = [4, 0] 44->50 46 lead_time <= 222.5 entropy = 0.918 samples = 3 value = [2, 1] 45->46 49 entropy = 0.0 samples = 1 value = [0, 1] 45->49 47 entropy = 0.0 samples = 2 value = [2, 0] 46->47 48 entropy = 0.0 samples = 1 value = [0, 1] 46->48 53 entropy = 0.0 samples = 1 value = [0, 1] 52->53 54 entropy = 0.0 samples = 3 value = [3, 0] 52->54 58 stays_in_weekend_nights <= 1.5 entropy = 0.918 samples = 9 value = [3, 6] 57->58 63 entropy = 0.0 samples = 6 value = [6, 0] 57->63 59 lead_time <= 210.0 entropy = 0.592 samples = 7 value = [1, 6] 58->59 62 entropy = 0.0 samples = 2 value = [2, 0] 58->62 60 entropy = 0.0 samples = 5 value = [0, 5] 59->60 61 entropy = 1.0 samples = 2 value = [1, 1] 59->61 65 stays_in_weekend_nights <= 2.5 entropy = 0.94 samples = 126 value = [45, 81] 64->65 122 adr <= 154.9 entropy = 0.592 samples = 56 value = [8, 48] 64->122 66 booking_changes <= 0.5 entropy = 0.904 samples = 119 value = [38, 81] 65->66 121 entropy = 0.0 samples = 7 value = [7, 0] 65->121 67 arrival_date_month_February <= 0.5 entropy = 0.967 samples = 84 value = [33, 51] 66->67 110 lead_time <= 69.5 entropy = 0.592 samples = 35 value = [5, 30] 66->110 68 distribution_channel_Corporate <= 0.5 entropy = 0.983 samples = 78 value = [33, 45] 67->68 109 entropy = 0.0 samples = 6 value = [0, 6] 67->109 69 arrival_date_week_number <= 11.5 entropy = 0.993 samples = 73 value = [33, 40] 68->69 108 entropy = 0.0 samples = 5 value = [0, 5] 68->108 70 arrival_date_day_of_month <= 5.0 entropy = 0.592 samples = 7 value = [6, 1] 69->70 73 arrival_date_month_November <= 0.5 entropy = 0.976 samples = 66 value = [27, 39] 69->73 71 entropy = 0.0 samples = 1 value = [0, 1] 70->71 72 entropy = 0.0 samples = 6 value = [6, 0] 70->72 74 stays_in_week_nights <= 5.5 entropy = 0.959 samples = 63 value = [24, 39] 73->74 107 entropy = 0.0 samples = 3 value = [3, 0] 73->107 75 stays_in_week_nights <= 3.5 entropy = 0.943 samples = 61 value = [22, 39] 74->75 106 entropy = 0.0 samples = 2 value = [2, 0] 74->106 76 arrival_date_day_of_month <= 9.5 entropy = 1.0 samples = 30 value = [15, 15] 75->76 93 adr <= 108.825 entropy = 0.771 samples = 31 value = [7, 24] 75->93 77 adr <= 123.75 entropy = 0.544 samples = 8 value = [7, 1] 76->77 80 lead_time <= 69.0 entropy = 0.946 samples = 22 value = [8, 14] 76->80 78 entropy = 1.0 samples = 2 value = [1, 1] 77->78 79 entropy = 0.0 samples = 6 value = [6, 0] 77->79 81 lead_time <= 11.0 entropy = 0.954 samples = 8 value = [5, 3] 80->81 86 arrival_date_month_June <= 0.5 entropy = 0.75 samples = 14 value = [3, 11] 80->86 82 entropy = 0.0 samples = 2 value = [0, 2] 81->82 83 adr <= 238.0 entropy = 0.65 samples = 6 value = [5, 1] 81->83 84 entropy = 0.0 samples = 5 value = [5, 0] 83->84 85 entropy = 0.0 samples = 1 value = [0, 1] 83->85 87 lead_time <= 149.5 entropy = 0.619 samples = 13 value = [2, 11] 86->87 92 entropy = 0.0 samples = 1 value = [1, 0] 86->92 88 adults <= 1.5 entropy = 0.414 samples = 12 value = [1, 11] 87->88 91 entropy = 0.0 samples = 1 value = [1, 0] 87->91 89 entropy = 0.0 samples = 1 value = [1, 0] 88->89 90 entropy = 0.0 samples = 11 value = [0, 11] 88->90 94 entropy = 0.0 samples = 8 value = [0, 8] 93->94 95 arrival_date_day_of_month <= 7.5 entropy = 0.887 samples = 23 value = [7, 16] 93->95 96 entropy = 0.0 samples = 6 value = [0, 6] 95->96 97 assigned_room_type_D <= 0.5 entropy = 0.977 samples = 17 value = [7, 10] 95->97 98 assigned_room_type_E <= 0.5 entropy = 0.918 samples = 15 value = [5, 10] 97->98 105 entropy = 0.0 samples = 2 value = [2, 0] 97->105 99 adr <= 116.25 entropy = 0.779 samples = 13 value = [3, 10] 98->99 104 entropy = 0.0 samples = 2 value = [2, 0] 98->104 100 entropy = 0.0 samples = 2 value = [2, 0] 99->100 101 meal_HB <= 0.5 entropy = 0.439 samples = 11 value = [1, 10] 99->101 102 entropy = 0.0 samples = 10 value = [0, 10] 101->102 103 entropy = 0.0 samples = 1 value = [1, 0] 101->103 111 entropy = 0.0 samples = 20 value = [0, 20] 110->111 112 stays_in_weekend_nights <= 1.5 entropy = 0.918 samples = 15 value = [5, 10] 110->112 113 entropy = 0.0 samples = 5 value = [0, 5] 112->113 114 lead_time <= 104.0 entropy = 1.0 samples = 10 value = [5, 5] 112->114 115 reserved_room_type_C <= 0.5 entropy = 0.722 samples = 5 value = [4, 1] 114->115 118 reserved_room_type_A <= 0.5 entropy = 0.722 samples = 5 value = [1, 4] 114->118 116 entropy = 0.0 samples = 4 value = [4, 0] 115->116 117 entropy = 0.0 samples = 1 value = [0, 1] 115->117 119 entropy = 0.0 samples = 4 value = [0, 4] 118->119 120 entropy = 0.0 samples = 1 value = [1, 0] 118->120 123 arrival_date_week_number <= 52.5 entropy = 0.371 samples = 42 value = [3, 39] 122->123 134 country_FRA <= 0.5 entropy = 0.94 samples = 14 value = [5, 9] 122->134 124 adr <= 88.8 entropy = 0.281 samples = 41 value = [2, 39] 123->124 133 entropy = 0.0 samples = 1 value = [1, 0] 123->133 125 lead_time <= 186.0 entropy = 0.503 samples = 18 value = [2, 16] 124->125 132 entropy = 0.0 samples = 23 value = [0, 23] 124->132 126 entropy = 0.0 samples = 1 value = [1, 0] 125->126 127 assigned_room_type_D <= 0.5 entropy = 0.323 samples = 17 value = [1, 16] 125->127 128 entropy = 0.0 samples = 13 value = [0, 13] 127->128 129 adr <= 70.555 entropy = 0.811 samples = 4 value = [1, 3] 127->129 130 entropy = 0.0 samples = 1 value = [1, 0] 129->130 131 entropy = 0.0 samples = 3 value = [0, 3] 129->131 135 lead_time <= 228.5 entropy = 0.811 samples = 12 value = [3, 9] 134->135 140 entropy = 0.0 samples = 2 value = [2, 0] 134->140 136 arrival_date_day_of_month <= 4.5 entropy = 0.469 samples = 10 value = [1, 9] 135->136 139 entropy = 0.0 samples = 2 value = [2, 0] 135->139 137 entropy = 0.0 samples = 1 value = [1, 0] 136->137 138 entropy = 0.0 samples = 9 value = [0, 9] 136->138 142 adr <= 93.75 entropy = 0.191 samples = 340 value = [10, 330] 141->142 169 distribution_channel_Corporate <= 0.5 entropy = 0.663 samples = 29 value = [5, 24] 141->169 143 entropy = 0.0 samples = 202 value = [0, 202] 142->143 144 previous_cancellations <= 0.5 entropy = 0.375 samples = 138 value = [10, 128] 142->144 145 booking_changes <= 0.5 entropy = 0.326 samples = 134 value = [8, 126] 144->145 166 country_listed_other <= 0.5 entropy = 1.0 samples = 4 value = [2, 2] 144->166 146 arrival_date_week_number <= 28.5 entropy = 0.429 samples = 91 value = [8, 83] 145->146 165 entropy = 0.0 samples = 43 value = [0, 43] 145->165 147 arrival_date_month_March <= 0.5 entropy = 0.146 samples = 48 value = [1, 47] 146->147 150 market_segment_Direct <= 0.5 entropy = 0.641 samples = 43 value = [7, 36] 146->150 148 entropy = 0.0 samples = 47 value = [0, 47] 147->148 149 entropy = 0.0 samples = 1 value = [1, 0] 147->149 151 country_DEU <= 0.5 entropy = 0.991 samples = 9 value = [4, 5] 150->151 158 meal_BB <= 0.5 entropy = 0.431 samples = 34 value = [3, 31] 150->158 152 assigned_room_type_D <= 0.5 entropy = 0.918 samples = 6 value = [4, 2] 151->152 157 entropy = 0.0 samples = 3 value = [0, 3] 151->157 153 arrival_date_day_of_month <= 14.0 entropy = 0.722 samples = 5 value = [4, 1] 152->153 156 entropy = 0.0 samples = 1 value = [0, 1] 152->156 154 entropy = 0.0 samples = 3 value = [3, 0] 153->154 155 entropy = 1.0 samples = 2 value = [1, 1] 153->155 159 arrival_date_week_number <= 34.0 entropy = 1.0 samples = 4 value = [2, 2] 158->159 162 arrival_date_month_November <= 0.5 entropy = 0.211 samples = 30 value = [1, 29] 158->162 160 entropy = 0.0 samples = 2 value = [2, 0] 159->160 161 entropy = 0.0 samples = 2 value = [0, 2] 159->161 163 entropy = 0.0 samples = 29 value = [0, 29] 162->163 164 entropy = 0.0 samples = 1 value = [1, 0] 162->164 167 entropy = 0.0 samples = 2 value = [0, 2] 166->167 168 entropy = 0.0 samples = 2 value = [2, 0] 166->168 170 entropy = 0.0 samples = 10 value = [0, 10] 169->170 171 total_of_special_requests <= 1.0 entropy = 0.831 samples = 19 value = [5, 14] 169->171 172 adr <= 41.0 entropy = 0.672 samples = 17 value = [3, 14] 171->172 179 entropy = 0.0 samples = 2 value = [2, 0] 171->179 173 entropy = 0.0 samples = 8 value = [0, 8] 172->173 174 stays_in_week_nights <= 1.5 entropy = 0.918 samples = 9 value = [3, 6] 172->174 175 lead_time <= 0.5 entropy = 0.592 samples = 7 value = [1, 6] 174->175 178 entropy = 0.0 samples = 2 value = [2, 0] 174->178 176 entropy = 0.0 samples = 1 value = [1, 0] 175->176 177 entropy = 0.0 samples = 6 value = [0, 6] 175->177 181 stays_in_week_nights <= 1.5 entropy = 0.071 samples = 1526 value = [13, 1513] 180->181 206 entropy = 0.0 samples = 4 value = [4, 0] 180->206 182 arrival_date_year <= 2015.5 entropy = 0.209 samples = 364 value = [12, 352] 181->182 199 agent_unknown <= 0.5 entropy = 0.01 samples = 1162 value = [1, 1161] 181->199 183 arrival_date_week_number <= 34.5 entropy = 0.532 samples = 91 value = [11, 80] 182->183 194 arrival_date_month_September <= 0.5 entropy = 0.035 samples = 273 value = [1, 272] 182->194 184 adr <= 83.5 entropy = 1.0 samples = 22 value = [11, 11] 183->184 193 entropy = 0.0 samples = 69 value = [0, 69] 183->193 185 entropy = 0.0 samples = 9 value = [0, 9] 184->185 186 assigned_room_type_D <= 0.5 entropy = 0.619 samples = 13 value = [11, 2] 184->186 187 arrival_date_day_of_month <= 9.5 entropy = 0.414 samples = 12 value = [11, 1] 186->187 192 entropy = 0.0 samples = 1 value = [0, 1] 186->192 188 arrival_date_day_of_month <= 4.5 entropy = 0.918 samples = 3 value = [2, 1] 187->188 191 entropy = 0.0 samples = 9 value = [9, 0] 187->191 189 entropy = 0.0 samples = 1 value = [1, 0] 188->189 190 entropy = 1.0 samples = 2 value = [1, 1] 188->190 195 entropy = 0.0 samples = 251 value = [0, 251] 194->195 196 arrival_date_day_of_month <= 25.5 entropy = 0.267 samples = 22 value = [1, 21] 194->196 197 entropy = 0.0 samples = 21 value = [0, 21] 196->197 198 entropy = 0.0 samples = 1 value = [1, 0] 196->198 200 entropy = 0.0 samples = 1117 value = [0, 1117] 199->200 201 arrival_date_day_of_month <= 26.5 entropy = 0.154 samples = 45 value = [1, 44] 199->201 202 entropy = 0.0 samples = 40 value = [0, 40] 201->202 203 assigned_room_type_A <= 0.5 entropy = 0.722 samples = 5 value = [1, 4] 201->203 204 entropy = 0.0 samples = 3 value = [0, 3] 203->204 205 entropy = 1.0 samples = 2 value = [1, 1] 203->205 208 previous_bookings_not_canceled <= 0.5 entropy = 0.735 samples = 842 value = [174, 668] 207->208 499 booking_changes <= 0.5 entropy = 1.0 samples = 1113 value = [545, 568] 207->499 209 lead_time <= 0.5 entropy = 0.805 samples = 667 value = [164, 503] 208->209 460 arrival_date_day_of_month <= 6.5 entropy = 0.316 samples = 175 value = [10, 165] 208->460 210 agent_unknown <= 0.5 entropy = 0.492 samples = 177 value = [19, 158] 209->210 253 booking_changes <= 0.5 entropy = 0.876 samples = 490 value = [145, 345] 209->253 211 entropy = 0.0 samples = 40 value = [0, 40] 210->211 212 arrival_date_week_number <= 8.5 entropy = 0.581 samples = 137 value = [19, 118] 210->212 213 entropy = 0.0 samples = 27 value = [0, 27] 212->213 214 stays_in_week_nights <= 3.5 entropy = 0.664 samples = 110 value = [19, 91] 212->214 215 arrival_date_day_of_month <= 25.5 entropy = 0.628 samples = 108 value = [17, 91] 214->215 252 entropy = 0.0 samples = 2 value = [2, 0] 214->252 216 arrival_date_month_October <= 0.5 entropy = 0.503 samples = 90 value = [10, 80] 215->216 241 arrival_date_week_number <= 13.5 entropy = 0.964 samples = 18 value = [7, 11] 215->241 217 arrival_date_day_of_month <= 8.5 entropy = 0.381 samples = 81 value = [6, 75] 216->217 234 adr <= 174.5 entropy = 0.991 samples = 9 value = [4, 5] 216->234 218 entropy = 0.0 samples = 29 value = [0, 29] 217->218 219 arrival_date_week_number <= 24.5 entropy = 0.516 samples = 52 value = [6, 46] 217->219 220 stays_in_weekend_nights <= 0.5 entropy = 0.792 samples = 21 value = [5, 16] 219->220 231 adults <= 2.5 entropy = 0.206 samples = 31 value = [1, 30] 219->231 221 arrival_date_day_of_month <= 22.5 entropy = 0.94 samples = 14 value = [5, 9] 220->221 230 entropy = 0.0 samples = 7 value = [0, 7] 220->230 222 adr <= 87.5 entropy = 0.991 samples = 9 value = [5, 4] 221->222 229 entropy = 0.0 samples = 5 value = [0, 5] 221->229 223 reserved_room_type_B <= 0.5 entropy = 0.722 samples = 5 value = [4, 1] 222->223 226 booking_changes <= 1.0 entropy = 0.811 samples = 4 value = [1, 3] 222->226 224 entropy = 0.0 samples = 4 value = [4, 0] 223->224 225 entropy = 0.0 samples = 1 value = [0, 1] 223->225 227 entropy = 0.0 samples = 3 value = [0, 3] 226->227 228 entropy = 0.0 samples = 1 value = [1, 0] 226->228 232 entropy = 0.0 samples = 29 value = [0, 29] 231->232 233 entropy = 1.0 samples = 2 value = [1, 1] 231->233 235 arrival_date_day_of_month <= 3.5 entropy = 0.863 samples = 7 value = [2, 5] 234->235 240 entropy = 0.0 samples = 2 value = [2, 0] 234->240 236 entropy = 0.0 samples = 3 value = [0, 3] 235->236 237 arrival_date_day_of_month <= 5.5 entropy = 1.0 samples = 4 value = [2, 2] 235->237 238 entropy = 0.0 samples = 2 value = [2, 0] 237->238 239 entropy = 0.0 samples = 2 value = [0, 2] 237->239 242 entropy = 0.0 samples = 4 value = [4, 0] 241->242 243 adr <= 119.0 entropy = 0.75 samples = 14 value = [3, 11] 241->243 244 stays_in_week_nights <= 1.5 entropy = 0.954 samples = 8 value = [3, 5] 243->244 251 entropy = 0.0 samples = 6 value = [0, 6] 243->251 245 arrival_date_month_July <= 0.5 entropy = 1.0 samples = 6 value = [3, 3] 244->245 250 entropy = 0.0 samples = 2 value = [0, 2] 244->250 246 assigned_room_type_B <= 0.5 entropy = 0.811 samples = 4 value = [1, 3] 245->246 249 entropy = 0.0 samples = 2 value = [2, 0] 245->249 247 entropy = 0.0 samples = 3 value = [0, 3] 246->247 248 entropy = 0.0 samples = 1 value = [1, 0] 246->248 254 adr <= 91.1 entropy = 0.921 samples = 399 value = [134, 265] 253->254 433 arrival_date_month_May <= 0.5 entropy = 0.532 samples = 91 value = [11, 80] 253->433 255 arrival_date_day_of_month <= 20.5 entropy = 0.854 samples = 283 value = [79, 204] 254->255 374 total_of_special_requests <= 0.5 entropy = 0.998 samples = 116 value = [55, 61] 254->374 256 stays_in_weekend_nights <= 0.5 entropy = 0.713 samples = 189 value = [37, 152] 255->256 331 arrival_date_week_number <= 39.5 entropy = 0.992 samples = 94 value = [42, 52] 255->331 257 arrival_date_month_June <= 0.5 entropy = 0.549 samples = 118 value = [15, 103] 256->257 298 adults <= 1.5 entropy = 0.893 samples = 71 value = [22, 49] 256->298 258 adr <= 76.0 entropy = 0.485 samples = 114 value = [12, 102] 257->258 295 arrival_date_day_of_month <= 11.5 entropy = 0.811 samples = 4 value = [3, 1] 257->295 259 distribution_channel_TA/TO <= 0.5 entropy = 0.563 samples = 91 value = [12, 79] 258->259 294 entropy = 0.0 samples = 23 value = [0, 23] 258->294 260 previous_cancellations <= 0.5 entropy = 0.681 samples = 61 value = [11, 50] 259->260 289 arrival_date_day_of_month <= 2.5 entropy = 0.211 samples = 30 value = [1, 29] 259->289 261 reserved_room_type_B <= 0.5 entropy = 0.65 samples = 60 value = [10, 50] 260->261 288 entropy = 0.0 samples = 1 value = [1, 0] 260->288 262 customer_type_Transient-Party <= 0.5 entropy = 0.616 samples = 59 value = [9, 50] 261->262 287 entropy = 0.0 samples = 1 value = [1, 0] 261->287 263 arrival_date_week_number <= 2.5 entropy = 0.696 samples = 48 value = [9, 39] 262->263 286 entropy = 0.0 samples = 11 value = [0, 11] 262->286 264 entropy = 0.0 samples = 1 value = [1, 0] 263->264 265 assigned_room_type_D <= 0.5 entropy = 0.658 samples = 47 value = [8, 39] 263->265 266 arrival_date_day_of_month <= 18.5 entropy = 0.776 samples = 35 value = [8, 27] 265->266 285 entropy = 0.0 samples = 12 value = [0, 12] 265->285 267 arrival_date_day_of_month <= 13.5 entropy = 0.696 samples = 32 value = [6, 26] 266->267 282 adr <= 26.5 entropy = 0.918 samples = 3 value = [2, 1] 266->282 268 assigned_room_type_E <= 0.5 entropy = 0.845 samples = 22 value = [6, 16] 267->268 281 entropy = 0.0 samples = 10 value = [0, 10] 267->281 269 arrival_date_year <= 2015.5 entropy = 0.722 samples = 20 value = [4, 16] 268->269 280 entropy = 0.0 samples = 2 value = [2, 0] 268->280 270 entropy = 0.0 samples = 10 value = [0, 10] 269->270 271 assigned_room_type_A <= 0.5 entropy = 0.971 samples = 10 value = [4, 6] 269->271 272 entropy = 0.0 samples = 3 value = [0, 3] 271->272 273 distribution_channel_Direct <= 0.5 entropy = 0.985 samples = 7 value = [4, 3] 271->273 274 lead_time <= 6.5 entropy = 0.971 samples = 5 value = [2, 3] 273->274 279 entropy = 0.0 samples = 2 value = [2, 0] 273->279 275 arrival_date_month_March <= 0.5 entropy = 0.811 samples = 4 value = [1, 3] 274->275 278 entropy = 0.0 samples = 1 value = [1, 0] 274->278 276 entropy = 0.0 samples = 2 value = [0, 2] 275->276 277 entropy = 1.0 samples = 2 value = [1, 1] 275->277 283 entropy = 0.0 samples = 1 value = [0, 1] 282->283 284 entropy = 0.0 samples = 2 value = [2, 0] 282->284 290 stays_in_week_nights <= 1.5 entropy = 0.918 samples = 3 value = [1, 2] 289->290 293 entropy = 0.0 samples = 27 value = [0, 27] 289->293 291 entropy = 0.0 samples = 2 value = [0, 2] 290->291 292 entropy = 0.0 samples = 1 value = [1, 0] 290->292 296 entropy = 0.0 samples = 3 value = [3, 0] 295->296 297 entropy = 0.0 samples = 1 value = [0, 1] 295->297 299 arrival_date_week_number <= 9.5 entropy = 0.544 samples = 32 value = [4, 28] 298->299 308 adr <= 87.3 entropy = 0.996 samples = 39 value = [18, 21] 298->308 300 adr <= 38.5 entropy = 0.985 samples = 7 value = [3, 4] 299->300 305 stays_in_weekend_nights <= 3.5 entropy = 0.242 samples = 25 value = [1, 24] 299->305 301 entropy = 0.0 samples = 3 value = [0, 3] 300->301 302 meal_SC <= 0.5 entropy = 0.811 samples = 4 value = [3, 1] 300->302 303 entropy = 0.0 samples = 3 value = [3, 0] 302->303 304 entropy = 0.0 samples = 1 value = [0, 1] 302->304 306 entropy = 0.0 samples = 24 value = [0, 24] 305->306 307 entropy = 0.0 samples = 1 value = [1, 0] 305->307 309 distribution_channel_Direct <= 0.5 entropy = 0.998 samples = 34 value = [18, 16] 308->309 330 entropy = 0.0 samples = 5 value = [0, 5] 308->330 310 lead_time <= 2.5 entropy = 0.931 samples = 26 value = [17, 9] 309->310 327 total_of_special_requests <= 1.5 entropy = 0.544 samples = 8 value = [1, 7] 309->327 311 entropy = 0.0 samples = 3 value = [0, 3] 310->311 312 arrival_date_week_number <= 1.5 entropy = 0.828 samples = 23 value = [17, 6] 310->312 313 entropy = 0.0 samples = 1 value = [0, 1] 312->313 314 arrival_date_day_of_month <= 6.0 entropy = 0.773 samples = 22 value = [17, 5] 312->314 315 entropy = 0.0 samples = 6 value = [6, 0] 314->315 316 meal_BB <= 0.5 entropy = 0.896 samples = 16 value = [11, 5] 314->316 317 entropy = 0.0 samples = 2 value = [0, 2] 316->317 318 adr <= 63.5 entropy = 0.75 samples = 14 value = [11, 3] 316->318 319 arrival_date_day_of_month <= 8.5 entropy = 0.954 samples = 8 value = [5, 3] 318->319 326 entropy = 0.0 samples = 6 value = [6, 0] 318->326 320 entropy = 0.0 samples = 3 value = [3, 0] 319->320 321 stays_in_week_nights <= 1.5 entropy = 0.971 samples = 5 value = [2, 3] 319->321 322 entropy = 0.0 samples = 2 value = [0, 2] 321->322 323 customer_type_Transient <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 321->323 324 entropy = 1.0 samples = 2 value = [1, 1] 323->324 325 entropy = 0.0 samples = 1 value = [1, 0] 323->325 328 entropy = 0.0 samples = 6 value = [0, 6] 327->328 329 entropy = 1.0 samples = 2 value = [1, 1] 327->329 332 stays_in_weekend_nights <= 0.5 entropy = 0.969 samples = 53 value = [32, 21] 331->332 361 stays_in_week_nights <= 2.5 entropy = 0.801 samples = 41 value = [10, 31] 331->361 333 arrival_date_day_of_month <= 26.5 entropy = 0.996 samples = 28 value = [13, 15] 332->333 352 arrival_date_day_of_month <= 27.5 entropy = 0.795 samples = 25 value = [19, 6] 332->352 334 lead_time <= 8.5 entropy = 0.971 samples = 20 value = [12, 8] 333->334 349 arrival_date_day_of_month <= 30.5 entropy = 0.544 samples = 8 value = [1, 7] 333->349 335 stays_in_week_nights <= 1.5 entropy = 0.837 samples = 15 value = [11, 4] 334->335 344 adr <= 48.5 entropy = 0.722 samples = 5 value = [1, 4] 334->344 336 market_segment_Direct <= 0.5 entropy = 1.0 samples = 8 value = [4, 4] 335->336 343 entropy = 0.0 samples = 7 value = [7, 0] 335->343 337 adr <= 70.0 entropy = 0.918 samples = 6 value = [4, 2] 336->337 342 entropy = 0.0 samples = 2 value = [0, 2] 336->342 338 arrival_date_week_number <= 19.0 entropy = 0.918 samples = 3 value = [1, 2] 337->338 341 entropy = 0.0 samples = 3 value = [3, 0] 337->341 339 entropy = 0.0 samples = 2 value = [0, 2] 338->339 340 entropy = 0.0 samples = 1 value = [1, 0] 338->340 345 entropy = 0.0 samples = 2 value = [0, 2] 344->345 346 adults <= 2.5 entropy = 0.918 samples = 3 value = [1, 2] 344->346 347 entropy = 1.0 samples = 2 value = [1, 1] 346->347 348 entropy = 0.0 samples = 1 value = [0, 1] 346->348 350 entropy = 0.0 samples = 7 value = [0, 7] 349->350 351 entropy = 0.0 samples = 1 value = [1, 0] 349->351 353 arrival_date_day_of_month <= 24.5 entropy = 0.971 samples = 15 value = [9, 6] 352->353 360 entropy = 0.0 samples = 10 value = [10, 0] 352->360 354 entropy = 0.0 samples = 6 value = [6, 0] 353->354 355 adr <= 58.5 entropy = 0.918 samples = 9 value = [3, 6] 353->355 356 entropy = 0.0 samples = 4 value = [0, 4] 355->356 357 adr <= 74.0 entropy = 0.971 samples = 5 value = [3, 2] 355->357 358 entropy = 0.0 samples = 3 value = [3, 0] 357->358 359 entropy = 0.0 samples = 2 value = [0, 2] 357->359 362 arrival_date_week_number <= 44.5 entropy = 0.639 samples = 37 value = [6, 31] 361->362 373 entropy = 0.0 samples = 4 value = [4, 0] 361->373 363 entropy = 0.0 samples = 16 value = [0, 16] 362->363 364 adr <= 37.75 entropy = 0.863 samples = 21 value = [6, 15] 362->364 365 entropy = 0.0 samples = 7 value = [0, 7] 364->365 366 adr <= 62.5 entropy = 0.985 samples = 14 value = [6, 8] 364->366 367 arrival_date_year <= 2015.5 entropy = 0.971 samples = 10 value = [6, 4] 366->367 372 entropy = 0.0 samples = 4 value = [0, 4] 366->372 368 customer_type_Transient <= 0.5 entropy = 0.918 samples = 6 value = [2, 4] 367->368 371 entropy = 0.0 samples = 4 value = [4, 0] 367->371 369 entropy = 0.0 samples = 2 value = [2, 0] 368->369 370 entropy = 0.0 samples = 4 value = [0, 4] 368->370 375 customer_type_Transient <= 0.5 entropy = 0.992 samples = 85 value = [47, 38] 374->375 422 arrival_date_day_of_month <= 16.5 entropy = 0.824 samples = 31 value = [8, 23] 374->422 376 lead_time <= 17.5 entropy = 0.887 samples = 23 value = [7, 16] 375->376 391 lead_time <= 4.5 entropy = 0.938 samples = 62 value = [40, 22] 375->391 377 distribution_channel_TA/TO <= 0.5 entropy = 0.792 samples = 21 value = [5, 16] 376->377 390 entropy = 0.0 samples = 2 value = [2, 0] 376->390 378 entropy = 0.0 samples = 8 value = [0, 8] 377->378 379 agent_1 <= 0.5 entropy = 0.961 samples = 13 value = [5, 8] 377->379 380 stays_in_weekend_nights <= 0.5 entropy = 1.0 samples = 10 value = [5, 5] 379->380 389 entropy = 0.0 samples = 3 value = [0, 3] 379->389 381 agent_listed_other <= 0.5 entropy = 0.954 samples = 8 value = [3, 5] 380->381 388 entropy = 0.0 samples = 2 value = [2, 0] 380->388 382 entropy = 0.0 samples = 1 value = [1, 0] 381->382 383 previous_cancellations <= 0.5 entropy = 0.863 samples = 7 value = [2, 5] 381->383 384 adr <= 143.0 entropy = 0.65 samples = 6 value = [1, 5] 383->384 387 entropy = 0.0 samples = 1 value = [1, 0] 383->387 385 entropy = 0.0 samples = 4 value = [0, 4] 384->385 386 entropy = 1.0 samples = 2 value = [1, 1] 384->386 392 arrival_date_day_of_month <= 11.5 entropy = 0.991 samples = 27 value = [12, 15] 391->392 407 agent_listed_other <= 0.5 entropy = 0.722 samples = 35 value = [28, 7] 391->407 393 entropy = 0.0 samples = 8 value = [0, 8] 392->393 394 stays_in_week_nights <= 1.5 entropy = 0.949 samples = 19 value = [12, 7] 392->394 395 lead_time <= 2.5 entropy = 0.997 samples = 15 value = [8, 7] 394->395 406 entropy = 0.0 samples = 4 value = [4, 0] 394->406 396 market_segment_Corporate <= 0.5 entropy = 0.961 samples = 13 value = [8, 5] 395->396 405 entropy = 0.0 samples = 2 value = [0, 2] 395->405 397 arrival_date_day_of_month <= 18.5 entropy = 0.845 samples = 11 value = [8, 3] 396->397 404 entropy = 0.0 samples = 2 value = [0, 2] 396->404 398 entropy = 0.0 samples = 4 value = [4, 0] 397->398 399 lead_time <= 1.5 entropy = 0.985 samples = 7 value = [4, 3] 397->399 400 adr <= 98.5 entropy = 0.811 samples = 4 value = [1, 3] 399->400 403 entropy = 0.0 samples = 3 value = [3, 0] 399->403 401 entropy = 0.0 samples = 1 value = [1, 0] 400->401 402 entropy = 0.0 samples = 3 value = [0, 3] 400->402 408 assigned_room_type_B <= 0.5 entropy = 0.439 samples = 22 value = [20, 2] 407->408 413 adults <= 1.5 entropy = 0.961 samples = 13 value = [8, 5] 407->413 409 arrival_date_day_of_month <= 3.0 entropy = 0.276 samples = 21 value = [20, 1] 408->409 412 entropy = 0.0 samples = 1 value = [0, 1] 408->412 410 entropy = 1.0 samples = 2 value = [1, 1] 409->410 411 entropy = 0.0 samples = 19 value = [19, 0] 409->411 414 entropy = 0.0 samples = 2 value = [0, 2] 413->414 415 assigned_room_type_A <= 0.5 entropy = 0.845 samples = 11 value = [8, 3] 413->415 416 assigned_room_type_E <= 0.5 entropy = 1.0 samples = 6 value = [3, 3] 415->416 421 entropy = 0.0 samples = 5 value = [5, 0] 415->421 417 arrival_date_month_March <= 0.5 entropy = 0.811 samples = 4 value = [1, 3] 416->417 420 entropy = 0.0 samples = 2 value = [2, 0] 416->420 418 entropy = 0.0 samples = 3 value = [0, 3] 417->418 419 entropy = 0.0 samples = 1 value = [1, 0] 417->419 423 arrival_date_day_of_month <= 7.5 entropy = 0.997 samples = 15 value = [7, 8] 422->423 430 adr <= 110.0 entropy = 0.337 samples = 16 value = [1, 15] 422->430 424 entropy = 0.0 samples = 5 value = [0, 5] 423->424 425 adr <= 109.5 entropy = 0.881 samples = 10 value = [7, 3] 423->425 426 market_segment_Corporate <= 0.5 entropy = 0.811 samples = 4 value = [1, 3] 425->426 429 entropy = 0.0 samples = 6 value = [6, 0] 425->429 427 entropy = 0.0 samples = 3 value = [0, 3] 426->427 428 entropy = 0.0 samples = 1 value = [1, 0] 426->428 431 entropy = 0.0 samples = 1 value = [1, 0] 430->431 432 entropy = 0.0 samples = 15 value = [0, 15] 430->432 434 assigned_room_type_A <= 0.5 entropy = 0.378 samples = 82 value = [6, 76] 433->434 455 adults <= 1.5 entropy = 0.991 samples = 9 value = [5, 4] 433->455 435 entropy = 0.0 samples = 31 value = [0, 31] 434->435 436 customer_type_Transient <= 0.5 entropy = 0.523 samples = 51 value = [6, 45] 434->436 437 entropy = 0.0 samples = 16 value = [0, 16] 436->437 438 previous_cancellations <= 0.5 entropy = 0.661 samples = 35 value = [6, 29] 436->438 439 adults <= 2.5 entropy = 0.602 samples = 34 value = [5, 29] 438->439 454 entropy = 0.0 samples = 1 value = [1, 0] 438->454 440 adults <= 1.5 entropy = 0.533 samples = 33 value = [4, 29] 439->440 453 entropy = 0.0 samples = 1 value = [1, 0] 439->453 441 total_of_special_requests <= 0.5 entropy = 0.787 samples = 17 value = [4, 13] 440->441 452 entropy = 0.0 samples = 16 value = [0, 16] 440->452 442 meal_BB <= 0.5 entropy = 0.592 samples = 14 value = [2, 12] 441->442 449 arrival_date_day_of_month <= 21.5 entropy = 0.918 samples = 3 value = [2, 1] 441->449 443 entropy = 0.0 samples = 1 value = [1, 0] 442->443 444 arrival_date_day_of_month <= 25.5 entropy = 0.391 samples = 13 value = [1, 12] 442->444 445 entropy = 0.0 samples = 10 value = [0, 10] 444->445 446 agent_unknown <= 0.5 entropy = 0.918 samples = 3 value = [1, 2] 444->446 447 entropy = 0.0 samples = 1 value = [1, 0] 446->447 448 entropy = 0.0 samples = 2 value = [0, 2] 446->448 450 entropy = 0.0 samples = 2 value = [2, 0] 449->450 451 entropy = 0.0 samples = 1 value = [0, 1] 449->451 456 arrival_date_day_of_month <= 7.0 entropy = 0.65 samples = 6 value = [5, 1] 455->456 459 entropy = 0.0 samples = 3 value = [0, 3] 455->459 457 entropy = 0.0 samples = 1 value = [0, 1] 456->457 458 entropy = 0.0 samples = 5 value = [5, 0] 456->458 461 entropy = 0.0 samples = 34 value = [0, 34] 460->461 462 arrival_date_day_of_month <= 8.5 entropy = 0.369 samples = 141 value = [10, 131] 460->462 463 distribution_channel_Corporate <= 0.5 entropy = 0.811 samples = 12 value = [3, 9] 462->463 470 previous_bookings_not_canceled <= 5.5 entropy = 0.304 samples = 129 value = [7, 122] 462->470 464 entropy = 0.0 samples = 5 value = [0, 5] 463->464 465 assigned_room_type_D <= 0.5 entropy = 0.985 samples = 7 value = [3, 4] 463->465 466 market_segment_Aviation <= 0.5 entropy = 0.722 samples = 5 value = [1, 4] 465->466 469 entropy = 0.0 samples = 2 value = [2, 0] 465->469 467 entropy = 0.0 samples = 4 value = [0, 4] 466->467 468 entropy = 0.0 samples = 1 value = [1, 0] 466->468 471 assigned_room_type_A <= 0.5 entropy = 0.358 samples = 103 value = [7, 96] 470->471 498 entropy = 0.0 samples = 26 value = [0, 26] 470->498 472 previous_bookings_not_canceled <= 4.5 entropy = 0.156 samples = 44 value = [1, 43] 471->472 477 arrival_date_day_of_month <= 24.5 entropy = 0.474 samples = 59 value = [6, 53] 471->477 473 entropy = 0.0 samples = 39 value = [0, 39] 472->473 474 arrival_date_day_of_month <= 25.0 entropy = 0.722 samples = 5 value = [1, 4] 472->474 475 entropy = 0.0 samples = 4 value = [0, 4] 474->475 476 entropy = 0.0 samples = 1 value = [1, 0] 474->476 478 hotel_City Hotel <= 0.5 entropy = 0.583 samples = 43 value = [6, 37] 477->478 497 entropy = 0.0 samples = 16 value = [0, 16] 477->497 479 stays_in_week_nights <= 0.5 entropy = 0.276 samples = 21 value = [1, 20] 478->479 482 lead_time <= 13.5 entropy = 0.773 samples = 22 value = [5, 17] 478->482 480 entropy = 1.0 samples = 2 value = [1, 1] 479->480 481 entropy = 0.0 samples = 19 value = [0, 19] 479->481 483 market_segment_Direct <= 0.5 entropy = 0.702 samples = 21 value = [4, 17] 482->483 496 entropy = 0.0 samples = 1 value = [1, 0] 482->496 484 arrival_date_month_September <= 0.5 entropy = 0.61 samples = 20 value = [3, 17] 483->484 495 entropy = 0.0 samples = 1 value = [1, 0] 483->495 485 arrival_date_day_of_month <= 17.5 entropy = 0.485 samples = 19 value = [2, 17] 484->485 494 entropy = 0.0 samples = 1 value = [1, 0] 484->494 486 entropy = 0.0 samples = 10 value = [0, 10] 485->486 487 stays_in_weekend_nights <= 0.5 entropy = 0.764 samples = 9 value = [2, 7] 485->487 488 arrival_date_month_November <= 0.5 entropy = 1.0 samples = 4 value = [2, 2] 487->488 493 entropy = 0.0 samples = 5 value = [0, 5] 487->493 489 arrival_date_day_of_month <= 19.0 entropy = 0.918 samples = 3 value = [1, 2] 488->489 492 entropy = 0.0 samples = 1 value = [1, 0] 488->492 490 entropy = 0.0 samples = 1 value = [1, 0] 489->490 491 entropy = 0.0 samples = 2 value = [0, 2] 489->491 500 total_of_special_requests <= 0.5 entropy = 0.993 samples = 921 value = [505, 416] 499->500 857 stays_in_week_nights <= 5.5 entropy = 0.738 samples = 192 value = [40, 152] 499->857 501 is_repeated_guest <= 0.5 entropy = 0.977 samples = 744 value = [438, 306] 500->501 770 adults <= 1.5 entropy = 0.957 samples = 177 value = [67, 110] 500->770 502 previous_cancellations <= 0.5 entropy = 0.967 samples = 714 value = [433, 281] 501->502 765 lead_time <= 138.5 entropy = 0.65 samples = 30 value = [5, 25] 501->765 503 arrival_date_year <= 2015.5 entropy = 0.987 samples = 619 value = [351, 268] 502->503 752 lead_time <= 276.0 entropy = 0.576 samples = 95 value = [82, 13] 502->752 504 arrival_date_month_August <= 0.5 entropy = 0.945 samples = 193 value = [70, 123] 503->504 591 stays_in_weekend_nights <= 3.5 entropy = 0.925 samples = 426 value = [281, 145] 503->591 505 meal_FB <= 0.5 entropy = 0.826 samples = 158 value = [41, 117] 504->505 578 lead_time <= 42.5 entropy = 0.661 samples = 35 value = [29, 6] 504->578 506 days_in_waiting_list <= 20.0 entropy = 0.79 samples = 152 value = [36, 116] 505->506 575 assigned_room_type_D <= 0.5 entropy = 0.65 samples = 6 value = [5, 1] 505->575 507 lead_time <= 217.5 entropy = 0.837 samples = 131 value = [35, 96] 506->507 570 adr <= 72.5 entropy = 0.276 samples = 21 value = [1, 20] 506->570 508 lead_time <= 207.0 entropy = 0.892 samples = 110 value = [34, 76] 507->508 567 lead_time <= 356.0 entropy = 0.276 samples = 21 value = [1, 20] 507->567 509 customer_type_Transient <= 0.5 entropy = 0.868 samples = 107 value = [31, 76] 508->509 566 entropy = 0.0 samples = 3 value = [3, 0] 508->566 510 adr <= 71.615 entropy = 0.778 samples = 87 value = [20, 67] 509->510 553 arrival_date_month_September <= 0.5 entropy = 0.993 samples = 20 value = [11, 9] 509->553 511 adr <= 67.0 entropy = 0.971 samples = 30 value = [12, 18] 510->511 532 adr <= 118.0 entropy = 0.585 samples = 57 value = [8, 49] 510->532 512 assigned_room_type_A <= 0.5 entropy = 0.918 samples = 27 value = [9, 18] 511->512 531 entropy = 0.0 samples = 3 value = [3, 0] 511->531 513 entropy = 0.0 samples = 7 value = [0, 7] 512->513 514 lead_time <= 41.5 entropy = 0.993 samples = 20 value = [9, 11] 512->514 515 entropy = 0.0 samples = 2 value = [2, 0] 514->515 516 lead_time <= 54.0 entropy = 0.964 samples = 18 value = [7, 11] 514->516 517 entropy = 0.0 samples = 3 value = [0, 3] 516->517 518 adr <= 63.0 entropy = 0.997 samples = 15 value = [7, 8] 516->518 519 arrival_date_week_number <= 42.5 entropy = 0.996 samples = 13 value = [7, 6] 518->519 530 entropy = 0.0 samples = 2 value = [0, 2] 518->530 520 arrival_date_month_October <= 0.5 entropy = 0.994 samples = 11 value = [5, 6] 519->520 529 entropy = 0.0 samples = 2 value = [2, 0] 519->529 521 stays_in_weekend_nights <= 0.5 entropy = 0.991 samples = 9 value = [5, 4] 520->521 528 entropy = 0.0 samples = 2 value = [0, 2] 520->528 522 adr <= 30.0 entropy = 0.985 samples = 7 value = [3, 4] 521->522 527 entropy = 0.0 samples = 2 value = [2, 0] 521->527 523 entropy = 0.0 samples = 1 value = [1, 0] 522->523 524 adr <= 61.0 entropy = 0.918 samples = 6 value = [2, 4] 522->524 525 entropy = 0.811 samples = 4 value = [1, 3] 524->525 526 entropy = 1.0 samples = 2 value = [1, 1] 524->526 533 assigned_room_type_A <= 0.5 entropy = 0.463 samples = 51 value = [5, 46] 532->533 550 meal_HB <= 0.5 entropy = 1.0 samples = 6 value = [3, 3] 532->550 534 entropy = 0.0 samples = 15 value = [0, 15] 533->534 535 arrival_date_week_number <= 41.5 entropy = 0.581 samples = 36 value = [5, 31] 533->535 536 lead_time <= 88.5 entropy = 0.706 samples = 26 value = [5, 21] 535->536 549 entropy = 0.0 samples = 10 value = [0, 10] 535->549 537 arrival_date_month_October <= 0.5 entropy = 0.971 samples = 10 value = [4, 6] 536->537 546 adults <= 1.5 entropy = 0.337 samples = 16 value = [1, 15] 536->546 538 adr <= 108.5 entropy = 0.811 samples = 8 value = [2, 6] 537->538 545 entropy = 0.0 samples = 2 value = [2, 0] 537->545 539 market_segment_Groups <= 0.5 entropy = 0.971 samples = 5 value = [2, 3] 538->539 544 entropy = 0.0 samples = 3 value = [0, 3] 538->544 540 entropy = 0.0 samples = 1 value = [1, 0] 539->540 541 lead_time <= 49.0 entropy = 0.811 samples = 4 value = [1, 3] 539->541 542 entropy = 0.918 samples = 3 value = [1, 2] 541->542 543 entropy = 0.0 samples = 1 value = [0, 1] 541->543 547 entropy = 0.503 samples = 9 value = [1, 8] 546->547 548 entropy = 0.0 samples = 7 value = [0, 7] 546->548 551 entropy = 0.0 samples = 3 value = [0, 3] 550->551 552 entropy = 0.0 samples = 3 value = [3, 0] 550->552 554 meal_HB <= 0.5 entropy = 0.998 samples = 17 value = [8, 9] 553->554 565 entropy = 0.0 samples = 3 value = [3, 0] 553->565 555 arrival_date_day_of_month <= 13.5 entropy = 0.845 samples = 11 value = [3, 8] 554->555 562 market_segment_Corporate <= 0.5 entropy = 0.65 samples = 6 value = [5, 1] 554->562 556 market_segment_Offline TA/TO <= 0.5 entropy = 1.0 samples = 6 value = [3, 3] 555->556 561 entropy = 0.0 samples = 5 value = [0, 5] 555->561 557 entropy = 0.0 samples = 2 value = [2, 0] 556->557 558 stays_in_weekend_nights <= 0.5 entropy = 0.811 samples = 4 value = [1, 3] 556->558 559 entropy = 0.0 samples = 1 value = [1, 0] 558->559 560 entropy = 0.0 samples = 3 value = [0, 3] 558->560 563 entropy = 0.0 samples = 5 value = [5, 0] 562->563 564 entropy = 0.0 samples = 1 value = [0, 1] 562->564 568 entropy = 0.0 samples = 20 value = [0, 20] 567->568 569 entropy = 0.0 samples = 1 value = [1, 0] 567->569 571 lead_time <= 100.0 entropy = 0.918 samples = 3 value = [1, 2] 570->571 574 entropy = 0.0 samples = 18 value = [0, 18] 570->574 572 entropy = 0.0 samples = 2 value = [0, 2] 571->572 573 entropy = 0.0 samples = 1 value = [1, 0] 571->573 576 entropy = 0.0 samples = 5 value = [5, 0] 575->576 577 entropy = 0.0 samples = 1 value = [0, 1] 575->577 579 adr <= 196.625 entropy = 0.267 samples = 22 value = [21, 1] 578->579 582 adr <= 134.0 entropy = 0.961 samples = 13 value = [8, 5] 578->582 580 entropy = 0.0 samples = 21 value = [21, 0] 579->580 581 entropy = 0.0 samples = 1 value = [0, 1] 579->581 583 stays_in_weekend_nights <= 1.0 entropy = 1.0 samples = 10 value = [5, 5] 582->583 590 entropy = 0.0 samples = 3 value = [3, 0] 582->590 584 arrival_date_day_of_month <= 27.0 entropy = 0.863 samples = 7 value = [5, 2] 583->584 589 entropy = 0.0 samples = 3 value = [0, 3] 583->589 585 lead_time <= 52.5 entropy = 0.918 samples = 6 value = [4, 2] 584->585 588 entropy = 0.0 samples = 1 value = [1, 0] 584->588 586 entropy = 0.811 samples = 4 value = [3, 1] 585->586 587 entropy = 1.0 samples = 2 value = [1, 1] 585->587 592 customer_type_Transient-Party <= 0.5 entropy = 0.937 samples = 411 value = [266, 145] 591->592 751 entropy = 0.0 samples = 15 value = [15, 0] 591->751 593 arrival_date_year <= 2016.5 entropy = 0.999 samples = 138 value = [72, 66] 592->593 666 meal_SC <= 0.5 entropy = 0.868 samples = 273 value = [194, 79] 592->666 594 arrival_date_day_of_month <= 10.5 entropy = 0.983 samples = 92 value = [39, 53] 593->594 643 arrival_date_week_number <= 13.5 entropy = 0.859 samples = 46 value = [33, 13] 593->643 595 lead_time <= 75.0 entropy = 0.722 samples = 25 value = [5, 20] 594->595 606 meal_HB <= 0.5 entropy = 1.0 samples = 67 value = [34, 33] 594->606 596 entropy = 0.0 samples = 11 value = [0, 11] 595->596 597 stays_in_weekend_nights <= 0.5 entropy = 0.94 samples = 14 value = [5, 9] 595->597 598 arrival_date_month_May <= 0.5 entropy = 0.811 samples = 4 value = [3, 1] 597->598 601 lead_time <= 230.0 entropy = 0.722 samples = 10 value = [2, 8] 597->601 599 entropy = 0.0 samples = 3 value = [3, 0] 598->599 600 entropy = 0.0 samples = 1 value = [0, 1] 598->600 602 entropy = 0.0 samples = 7 value = [0, 7] 601->602 603 arrival_date_month_April <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 601->603 604 entropy = 0.0 samples = 2 value = [2, 0] 603->604 605 entropy = 0.0 samples = 1 value = [0, 1] 603->605 607 stays_in_week_nights <= 2.5 entropy = 0.985 samples = 56 value = [32, 24] 606->607 638 arrival_date_month_August <= 0.5 entropy = 0.684 samples = 11 value = [2, 9] 606->638 608 adults <= 1.5 entropy = 0.985 samples = 35 value = [15, 20] 607->608 629 hotel_Resort Hotel <= 0.5 entropy = 0.702 samples = 21 value = [17, 4] 607->629 609 distribution_channel_Corporate <= 0.5 entropy = 0.811 samples = 8 value = [6, 2] 608->609 612 assigned_room_type_D <= 0.5 entropy = 0.918 samples = 27 value = [9, 18] 608->612 610 entropy = 0.0 samples = 6 value = [6, 0] 609->610 611 entropy = 0.0 samples = 2 value = [0, 2] 609->611 613 stays_in_week_nights <= 1.5 entropy = 0.742 samples = 19 value = [4, 15] 612->613 624 lead_time <= 42.0 entropy = 0.954 samples = 8 value = [5, 3] 612->624 614 entropy = 0.0 samples = 7 value = [0, 7] 613->614 615 arrival_date_day_of_month <= 24.5 entropy = 0.918 samples = 12 value = [4, 8] 613->615 616 arrival_date_week_number <= 42.5 entropy = 0.592 samples = 7 value = [1, 6] 615->616 619 market_segment_Offline TA/TO <= 0.5 entropy = 0.971 samples = 5 value = [3, 2] 615->619 617 entropy = 0.0 samples = 5 value = [0, 5] 616->617 618 entropy = 1.0 samples = 2 value = [1, 1] 616->618 620 entropy = 0.0 samples = 2 value = [2, 0] 619->620 621 arrival_date_day_of_month <= 25.5 entropy = 0.918 samples = 3 value = [1, 2] 619->621 622 entropy = 0.0 samples = 1 value = [1, 0] 621->622 623 entropy = 0.0 samples = 2 value = [0, 2] 621->623 625 arrival_date_month_May <= 0.5 entropy = 0.811 samples = 4 value = [1, 3] 624->625 628 entropy = 0.0 samples = 4 value = [4, 0] 624->628 626 entropy = 0.0 samples = 3 value = [0, 3] 625->626 627 entropy = 0.0 samples = 1 value = [1, 0] 625->627 630 arrival_date_month_July <= 0.5 entropy = 0.918 samples = 12 value = [8, 4] 629->630 637 entropy = 0.0 samples = 9 value = [9, 0] 629->637 631 arrival_date_day_of_month <= 21.5 entropy = 0.722 samples = 10 value = [8, 2] 630->631 636 entropy = 0.0 samples = 2 value = [0, 2] 630->636 632 entropy = 0.0 samples = 6 value = [6, 0] 631->632 633 arrival_date_day_of_month <= 27.0 entropy = 1.0 samples = 4 value = [2, 2] 631->633 634 entropy = 0.0 samples = 2 value = [0, 2] 633->634 635 entropy = 0.0 samples = 2 value = [2, 0] 633->635 639 entropy = 0.0 samples = 8 value = [0, 8] 638->639 640 adr <= 125.625 entropy = 0.918 samples = 3 value = [2, 1] 638->640 641 entropy = 0.0 samples = 1 value = [0, 1] 640->641 642 entropy = 0.0 samples = 2 value = [2, 0] 640->642 644 lead_time <= 43.0 entropy = 0.918 samples = 9 value = [3, 6] 643->644 649 arrival_date_week_number <= 25.5 entropy = 0.7 samples = 37 value = [30, 7] 643->649 645 distribution_channel_TA/TO <= 0.5 entropy = 0.811 samples = 4 value = [3, 1] 644->645 648 entropy = 0.0 samples = 5 value = [0, 5] 644->648 646 entropy = 0.0 samples = 1 value = [0, 1] 645->646 647 entropy = 0.0 samples = 3 value = [3, 0] 645->647 650 entropy = 0.0 samples = 14 value = [14, 0] 649->650 651 lead_time <= 45.0 entropy = 0.887 samples = 23 value = [16, 7] 649->651 652 entropy = 0.0 samples = 2 value = [0, 2] 651->652 653 adr <= 151.1 entropy = 0.792 samples = 21 value = [16, 5] 651->653 654 adr <= 141.095 entropy = 0.918 samples = 15 value = [10, 5] 653->654 665 entropy = 0.0 samples = 6 value = [6, 0] 653->665 655 arrival_date_day_of_month <= 3.0 entropy = 0.779 samples = 13 value = [10, 3] 654->655 664 entropy = 0.0 samples = 2 value = [0, 2] 654->664 656 adults <= 2.5 entropy = 0.918 samples = 3 value = [1, 2] 655->656 659 arrival_date_day_of_month <= 27.0 entropy = 0.469 samples = 10 value = [9, 1] 655->659 657 entropy = 0.0 samples = 2 value = [0, 2] 656->657 658 entropy = 0.0 samples = 1 value = [1, 0] 656->658 660 entropy = 0.0 samples = 7 value = [7, 0] 659->660 661 distribution_channel_TA/TO <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 659->661 662 entropy = 0.0 samples = 2 value = [2, 0] 661->662 663 entropy = 0.0 samples = 1 value = [0, 1] 661->663 667 market_segment_Corporate <= 0.5 entropy = 0.837 samples = 262 value = [192, 70] 666->667 744 stays_in_week_nights <= 4.0 entropy = 0.684 samples = 11 value = [2, 9] 666->744 668 arrival_date_week_number <= 42.5 entropy = 0.801 samples = 242 value = [183, 59] 667->668 739 arrival_date_year <= 2016.5 entropy = 0.993 samples = 20 value = [9, 11] 667->739 669 adr <= 131.0 entropy = 0.837 samples = 221 value = [162, 59] 668->669 738 entropy = 0.0 samples = 21 value = [21, 0] 668->738 670 lead_time <= 43.5 entropy = 0.817 samples = 217 value = [162, 55] 669->670 737 entropy = 0.0 samples = 4 value = [0, 4] 669->737 671 assigned_room_type_F <= 0.5 entropy = 0.229 samples = 27 value = [26, 1] 670->671 674 lead_time <= 420.5 entropy = 0.861 samples = 190 value = [136, 54] 670->674 672 entropy = 0.0 samples = 26 value = [26, 0] 671->672 673 entropy = 0.0 samples = 1 value = [0, 1] 671->673 675 lead_time <= 291.0 entropy = 0.883 samples = 179 value = [125, 54] 674->675 736 entropy = 0.0 samples = 11 value = [11, 0] 674->736 676 agent_1 <= 0.5 entropy = 0.811 samples = 148 value = [111, 37] 675->676 725 adr <= 104.0 entropy = 0.993 samples = 31 value = [14, 17] 675->725 677 arrival_date_month_May <= 0.5 entropy = 0.751 samples = 135 value = [106, 29] 676->677 722 lead_time <= 223.0 entropy = 0.961 samples = 13 value = [5, 8] 676->722 678 adr <= 93.5 entropy = 0.822 samples = 113 value = [84, 29] 677->678 721 entropy = 0.0 samples = 22 value = [22, 0] 677->721 679 adr <= 25.5 entropy = 0.627 samples = 70 value = [59, 11] 678->679 702 distribution_channel_TA/TO <= 0.5 entropy = 0.981 samples = 43 value = [25, 18] 678->702 680 entropy = 0.0 samples = 2 value = [0, 2] 679->680 681 lead_time <= 102.5 entropy = 0.564 samples = 68 value = [59, 9] 679->681 682 entropy = 0.0 samples = 24 value = [24, 0] 681->682 683 adr <= 64.4 entropy = 0.731 samples = 44 value = [35, 9] 681->683 684 entropy = 0.0 samples = 14 value = [14, 0] 683->684 685 arrival_date_week_number <= 29.5 entropy = 0.881 samples = 30 value = [21, 9] 683->685 686 arrival_date_day_of_month <= 29.0 entropy = 0.764 samples = 27 value = [21, 6] 685->686 701 entropy = 0.0 samples = 3 value = [0, 3] 685->701 687 hotel_City Hotel <= 0.5 entropy = 0.634 samples = 25 value = [21, 4] 686->687 700 entropy = 0.0 samples = 2 value = [0, 2] 686->700 688 entropy = 0.0 samples = 10 value = [10, 0] 687->688 689 assigned_room_type_B <= 0.5 entropy = 0.837 samples = 15 value = [11, 4] 687->689 690 lead_time <= 142.5 entropy = 0.75 samples = 14 value = [11, 3] 689->690 699 entropy = 0.0 samples = 1 value = [0, 1] 689->699 691 arrival_date_day_of_month <= 20.0 entropy = 0.811 samples = 12 value = [9, 3] 690->691 698 entropy = 0.0 samples = 2 value = [2, 0] 690->698 692 entropy = 1.0 samples = 2 value = [1, 1] 691->692 693 market_segment_Groups <= 0.5 entropy = 0.722 samples = 10 value = [8, 2] 691->693 694 entropy = 0.0 samples = 1 value = [1, 0] 693->694 695 assigned_room_type_D <= 0.5 entropy = 0.764 samples = 9 value = [7, 2] 693->695 696 entropy = 0.811 samples = 8 value = [6, 2] 695->696 697 entropy = 0.0 samples = 1 value = [1, 0] 695->697 703 entropy = 0.0 samples = 8 value = [8, 0] 702->703 704 lead_time <= 65.5 entropy = 0.999 samples = 35 value = [17, 18] 702->704 705 meal_HB <= 0.5 entropy = 0.503 samples = 9 value = [1, 8] 704->705 708 arrival_date_day_of_month <= 10.5 entropy = 0.961 samples = 26 value = [16, 10] 704->708 706 entropy = 0.0 samples = 3 value = [0, 3] 705->706 707 entropy = 0.65 samples = 6 value = [1, 5] 705->707 709 entropy = 0.0 samples = 6 value = [6, 0] 708->709 710 arrival_date_week_number <= 17.5 entropy = 1.0 samples = 20 value = [10, 10] 708->710 711 arrival_date_week_number <= 13.5 entropy = 0.544 samples = 8 value = [7, 1] 710->711 714 lead_time <= 128.0 entropy = 0.811 samples = 12 value = [3, 9] 710->714 712 entropy = 0.0 samples = 1 value = [0, 1] 711->712 713 entropy = 0.0 samples = 7 value = [7, 0] 711->713 715 entropy = 0.0 samples = 5 value = [0, 5] 714->715 716 lead_time <= 176.5 entropy = 0.985 samples = 7 value = [3, 4] 714->716 717 entropy = 0.0 samples = 2 value = [2, 0] 716->717 718 stays_in_week_nights <= 2.5 entropy = 0.722 samples = 5 value = [1, 4] 716->718 719 entropy = 0.0 samples = 3 value = [0, 3] 718->719 720 entropy = 1.0 samples = 2 value = [1, 1] 718->720 723 entropy = 0.0 samples = 8 value = [0, 8] 722->723 724 entropy = 0.0 samples = 5 value = [5, 0] 722->724 726 hotel_City Hotel <= 0.5 entropy = 0.98 samples = 24 value = [14, 10] 725->726 735 entropy = 0.0 samples = 7 value = [0, 7] 725->735 727 stays_in_weekend_nights <= 1.5 entropy = 0.89 samples = 13 value = [4, 9] 726->727 732 arrival_date_week_number <= 19.0 entropy = 0.439 samples = 11 value = [10, 1] 726->732 728 entropy = 0.0 samples = 7 value = [0, 7] 727->728 729 lead_time <= 335.5 entropy = 0.918 samples = 6 value = [4, 2] 727->729 730 entropy = 0.0 samples = 4 value = [4, 0] 729->730 731 entropy = 0.0 samples = 2 value = [0, 2] 729->731 733 entropy = 1.0 samples = 2 value = [1, 1] 732->733 734 entropy = 0.0 samples = 9 value = [9, 0] 732->734 740 arrival_date_month_July <= 0.5 entropy = 0.414 samples = 12 value = [1, 11] 739->740 743 entropy = 0.0 samples = 8 value = [8, 0] 739->743 741 entropy = 0.0 samples = 10 value = [0, 10] 740->741 742 entropy = 1.0 samples = 2 value = [1, 1] 740->742 745 adults <= 1.5 entropy = 0.469 samples = 10 value = [1, 9] 744->745 750 entropy = 0.0 samples = 1 value = [1, 0] 744->750 746 assigned_room_type_C <= 0.5 entropy = 0.722 samples = 5 value = [1, 4] 745->746 749 entropy = 0.0 samples = 5 value = [0, 5] 745->749 747 entropy = 0.811 samples = 4 value = [1, 3] 746->747 748 entropy = 0.0 samples = 1 value = [0, 1] 746->748 753 lead_time <= 49.5 entropy = 0.146 samples = 48 value = [47, 1] 752->753 756 customer_type_Transient-Party <= 0.5 entropy = 0.82 samples = 47 value = [35, 12] 752->756 754 entropy = 0.0 samples = 1 value = [0, 1] 753->754 755 entropy = 0.0 samples = 47 value = [47, 0] 753->755 757 entropy = 0.0 samples = 6 value = [0, 6] 756->757 758 adr <= 62.4 entropy = 0.601 samples = 41 value = [35, 6] 756->758 759 arrival_date_week_number <= 25.0 entropy = 0.191 samples = 34 value = [33, 1] 758->759 762 adr <= 68.0 entropy = 0.863 samples = 7 value = [2, 5] 758->762 760 entropy = 0.0 samples = 1 value = [0, 1] 759->760 761 entropy = 0.0 samples = 33 value = [33, 0] 759->761 763 entropy = 0.0 samples = 5 value = [0, 5] 762->763 764 entropy = 0.0 samples = 2 value = [2, 0] 762->764 766 previous_cancellations <= 0.5 entropy = 0.991 samples = 9 value = [5, 4] 765->766 769 entropy = 0.0 samples = 21 value = [0, 21] 765->769 767 entropy = 0.0 samples = 4 value = [0, 4] 766->767 768 entropy = 0.0 samples = 5 value = [5, 0] 766->768 771 lead_time <= 213.0 entropy = 0.89 samples = 26 value = [18, 8] 770->771 784 agent_1 <= 0.5 entropy = 0.909 samples = 151 value = [49, 102] 770->784 772 reserved_room_type_E <= 0.5 entropy = 0.755 samples = 23 value = [18, 5] 771->772 783 entropy = 0.0 samples = 3 value = [0, 3] 771->783 773 stays_in_week_nights <= 11.0 entropy = 0.592 samples = 21 value = [18, 3] 772->773 782 entropy = 0.0 samples = 2 value = [0, 2] 772->782 774 distribution_channel_TA/TO <= 0.5 entropy = 0.469 samples = 20 value = [18, 2] 773->774 781 entropy = 0.0 samples = 1 value = [0, 1] 773->781 775 adr <= 81.0 entropy = 0.863 samples = 7 value = [5, 2] 774->775 780 entropy = 0.0 samples = 13 value = [13, 0] 774->780 776 assigned_room_type_B <= 0.5 entropy = 0.918 samples = 3 value = [1, 2] 775->776 779 entropy = 0.0 samples = 4 value = [4, 0] 775->779 777 entropy = 0.0 samples = 2 value = [0, 2] 776->777 778 entropy = 0.0 samples = 1 value = [1, 0] 776->778 785 adr <= 80.675 entropy = 0.932 samples = 141 value = [49, 92] 784->785 856 entropy = 0.0 samples = 10 value = [0, 10] 784->856 786 market_segment_Direct <= 0.5 entropy = 1.0 samples = 39 value = [20, 19] 785->786 809 adr <= 174.505 entropy = 0.861 samples = 102 value = [29, 73] 785->809 787 agent_unknown <= 0.5 entropy = 0.971 samples = 30 value = [12, 18] 786->787 806 stays_in_week_nights <= 1.5 entropy = 0.503 samples = 9 value = [8, 1] 786->806 788 arrival_date_month_September <= 0.5 entropy = 0.999 samples = 25 value = [12, 13] 787->788 805 entropy = 0.0 samples = 5 value = [0, 5] 787->805 789 lead_time <= 183.5 entropy = 0.994 samples = 22 value = [12, 10] 788->789 804 entropy = 0.0 samples = 3 value = [0, 3] 788->804 790 reserved_room_type_D <= 0.5 entropy = 0.998 samples = 19 value = [9, 10] 789->790 803 entropy = 0.0 samples = 3 value = [3, 0] 789->803 791 arrival_date_day_of_month <= 9.0 entropy = 0.998 samples = 17 value = [9, 8] 790->791 802 entropy = 0.0 samples = 2 value = [0, 2] 790->802 792 entropy = 0.0 samples = 2 value = [0, 2] 791->792 793 arrival_date_day_of_month <= 17.0 entropy = 0.971 samples = 15 value = [9, 6] 791->793 794 arrival_date_month_June <= 0.5 entropy = 0.592 samples = 7 value = [6, 1] 793->794 797 stays_in_week_nights <= 2.5 entropy = 0.954 samples = 8 value = [3, 5] 793->797 795 entropy = 0.0 samples = 5 value = [5, 0] 794->795 796 entropy = 1.0 samples = 2 value = [1, 1] 794->796 798 lead_time <= 64.0 entropy = 0.971 samples = 5 value = [3, 2] 797->798 801 entropy = 0.0 samples = 3 value = [0, 3] 797->801 799 entropy = 0.918 samples = 3 value = [1, 2] 798->799 800 entropy = 0.0 samples = 2 value = [2, 0] 798->800 807 entropy = 1.0 samples = 2 value = [1, 1] 806->807 808 entropy = 0.0 samples = 7 value = [7, 0] 806->808 810 arrival_date_year <= 2015.5 entropy = 0.776 samples = 83 value = [19, 64] 809->810 843 arrival_date_day_of_month <= 7.5 entropy = 0.998 samples = 19 value = [10, 9] 809->843 811 previous_cancellations <= 0.5 entropy = 0.276 samples = 21 value = [1, 20] 810->811 814 adr <= 143.1 entropy = 0.869 samples = 62 value = [18, 44] 810->814 812 entropy = 0.0 samples = 20 value = [0, 20] 811->812 813 entropy = 0.0 samples = 1 value = [1, 0] 811->813 815 total_of_special_requests <= 1.5 entropy = 0.782 samples = 56 value = [13, 43] 814->815 840 customer_type_Contract <= 0.5 entropy = 0.65 samples = 6 value = [5, 1] 814->840 816 adr <= 85.25 entropy = 0.867 samples = 45 value = [13, 32] 815->816 839 entropy = 0.0 samples = 11 value = [0, 11] 815->839 817 entropy = 0.0 samples = 7 value = [0, 7] 816->817 818 agent_listed_other <= 0.5 entropy = 0.927 samples = 38 value = [13, 25] 816->818 819 stays_in_week_nights <= 1.5 entropy = 0.722 samples = 5 value = [4, 1] 818->819 822 lead_time <= 96.5 entropy = 0.845 samples = 33 value = [9, 24] 818->822 820 entropy = 0.0 samples = 1 value = [0, 1] 819->820 821 entropy = 0.0 samples = 4 value = [4, 0] 819->821 823 arrival_date_day_of_month <= 19.0 entropy = 0.996 samples = 13 value = [7, 6] 822->823 832 arrival_date_week_number <= 31.5 entropy = 0.469 samples = 20 value = [2, 18] 822->832 824 arrival_date_month_June <= 0.5 entropy = 0.811 samples = 8 value = [2, 6] 823->824 831 entropy = 0.0 samples = 5 value = [5, 0] 823->831 825 entropy = 0.0 samples = 4 value = [0, 4] 824->825 826 arrival_date_week_number <= 24.5 entropy = 1.0 samples = 4 value = [2, 2] 824->826 827 customer_type_Transient-Party <= 0.5 entropy = 0.918 samples = 3 value = [1, 2] 826->827 830 entropy = 0.0 samples = 1 value = [1, 0] 826->830 828 entropy = 0.0 samples = 1 value = [0, 1] 827->828 829 entropy = 1.0 samples = 2 value = [1, 1] 827->829 833 entropy = 0.0 samples = 14 value = [0, 14] 832->833 834 arrival_date_week_number <= 37.5 entropy = 0.918 samples = 6 value = [2, 4] 832->834 835 stays_in_weekend_nights <= 1.0 entropy = 0.918 samples = 3 value = [2, 1] 834->835 838 entropy = 0.0 samples = 3 value = [0, 3] 834->838 836 entropy = 0.0 samples = 2 value = [2, 0] 835->836 837 entropy = 0.0 samples = 1 value = [0, 1] 835->837 841 entropy = 0.0 samples = 5 value = [5, 0] 840->841 842 entropy = 0.0 samples = 1 value = [0, 1] 840->842 844 entropy = 0.0 samples = 3 value = [3, 0] 843->844 845 arrival_date_week_number <= 34.5 entropy = 0.989 samples = 16 value = [7, 9] 843->845 846 arrival_date_day_of_month <= 13.0 entropy = 0.764 samples = 9 value = [2, 7] 845->846 851 market_segment_Groups <= 0.5 entropy = 0.863 samples = 7 value = [5, 2] 845->851 847 assigned_room_type_D <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 846->847 850 entropy = 0.0 samples = 6 value = [0, 6] 846->850 848 entropy = 0.0 samples = 1 value = [0, 1] 847->848 849 entropy = 0.0 samples = 2 value = [2, 0] 847->849 852 entropy = 0.0 samples = 4 value = [4, 0] 851->852 853 meal_SC <= 0.5 entropy = 0.918 samples = 3 value = [1, 2] 851->853 854 entropy = 0.0 samples = 2 value = [0, 2] 853->854 855 entropy = 0.0 samples = 1 value = [1, 0] 853->855 858 arrival_date_day_of_month <= 4.5 entropy = 0.673 samples = 181 value = [32, 149] 857->858 911 arrival_date_year <= 2015.5 entropy = 0.845 samples = 11 value = [8, 3] 857->911 859 entropy = 0.0 samples = 19 value = [0, 19] 858->859 860 customer_type_Contract <= 0.5 entropy = 0.717 samples = 162 value = [32, 130] 858->860 861 arrival_date_week_number <= 9.5 entropy = 0.696 samples = 160 value = [30, 130] 860->861 910 entropy = 0.0 samples = 2 value = [2, 0] 860->910 862 entropy = 0.0 samples = 17 value = [0, 17] 861->862 863 arrival_date_year <= 2015.5 entropy = 0.741 samples = 143 value = [30, 113] 861->863 864 reserved_room_type_D <= 0.5 entropy = 0.439 samples = 44 value = [4, 40] 863->864 877 total_of_special_requests <= 0.5 entropy = 0.831 samples = 99 value = [26, 73] 863->877 865 arrival_date_day_of_month <= 7.0 entropy = 0.365 samples = 43 value = [3, 40] 864->865 876 entropy = 0.0 samples = 1 value = [1, 0] 864->876 866 entropy = 0.0 samples = 1 value = [1, 0] 865->866 867 adr <= 62.5 entropy = 0.276 samples = 42 value = [2, 40] 865->867 868 arrival_date_week_number <= 38.5 entropy = 0.592 samples = 14 value = [2, 12] 867->868 875 entropy = 0.0 samples = 28 value = [0, 28] 867->875 869 arrival_date_day_of_month <= 13.0 entropy = 0.971 samples = 5 value = [2, 3] 868->869 874 entropy = 0.0 samples = 9 value = [0, 9] 868->874 870 entropy = 0.0 samples = 2 value = [0, 2] 869->870 871 market_segment_Groups <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 869->871 872 entropy = 0.0 samples = 1 value = [0, 1] 871->872 873 entropy = 0.0 samples = 2 value = [2, 0] 871->873 878 arrival_date_month_March <= 0.5 entropy = 0.923 samples = 68 value = [23, 45] 877->878 905 children <= 0.5 entropy = 0.459 samples = 31 value = [3, 28] 877->905 879 stays_in_week_nights <= 3.5 entropy = 0.977 samples = 56 value = [23, 33] 878->879 904 entropy = 0.0 samples = 12 value = [0, 12] 878->904 880 market_segment_Offline TA/TO <= 0.5 entropy = 1.0 samples = 43 value = [22, 21] 879->880 901 adults <= 2.5 entropy = 0.391 samples = 13 value = [1, 12] 879->901 881 arrival_date_day_of_month <= 29.5 entropy = 0.938 samples = 31 value = [20, 11] 880->881 896 arrival_date_year <= 2016.5 entropy = 0.65 samples = 12 value = [2, 10] 880->896 882 arrival_date_year <= 2016.5 entropy = 0.826 samples = 27 value = [20, 7] 881->882 895 entropy = 0.0 samples = 4 value = [0, 4] 881->895 883 arrival_date_week_number <= 43.5 entropy = 0.61 samples = 20 value = [17, 3] 882->883 890 hotel_Resort Hotel <= 0.5 entropy = 0.985 samples = 7 value = [3, 4] 882->890 884 entropy = 0.0 samples = 11 value = [11, 0] 883->884 885 adr <= 73.2 entropy = 0.918 samples = 9 value = [6, 3] 883->885 886 entropy = 0.0 samples = 2 value = [0, 2] 885->886 887 arrival_date_month_October <= 0.5 entropy = 0.592 samples = 7 value = [6, 1] 885->887 888 entropy = 0.0 samples = 6 value = [6, 0] 887->888 889 entropy = 0.0 samples = 1 value = [0, 1] 887->889 891 entropy = 0.0 samples = 3 value = [0, 3] 890->891 892 arrival_date_week_number <= 18.0 entropy = 0.811 samples = 4 value = [3, 1] 890->892 893 entropy = 0.0 samples = 1 value = [0, 1] 892->893 894 entropy = 0.0 samples = 3 value = [3, 0] 892->894 897 arrival_date_week_number <= 13.0 entropy = 0.439 samples = 11 value = [1, 10] 896->897 900 entropy = 0.0 samples = 1 value = [1, 0] 896->900 898 entropy = 0.0 samples = 1 value = [1, 0] 897->898 899 entropy = 0.0 samples = 10 value = [0, 10] 897->899 902 entropy = 0.0 samples = 12 value = [0, 12] 901->902 903 entropy = 0.0 samples = 1 value = [1, 0] 901->903 906 entropy = 0.0 samples = 24 value = [0, 24] 905->906 907 arrival_date_day_of_month <= 20.5 entropy = 0.985 samples = 7 value = [3, 4] 905->907 908 entropy = 0.0 samples = 4 value = [0, 4] 907->908 909 entropy = 0.0 samples = 3 value = [3, 0] 907->909 912 entropy = 0.0 samples = 2 value = [0, 2] 911->912 913 assigned_room_type_I <= 0.5 entropy = 0.503 samples = 9 value = [8, 1] 911->913 914 entropy = 0.0 samples = 8 value = [8, 0] 913->914 915 entropy = 0.0 samples = 1 value = [0, 1] 913->915 917 lead_time <= 8.5 entropy = 0.997 samples = 1935 value = [1025, 910] 916->917 1708 lead_time <= 10.5 entropy = 0.836 samples = 3140 value = [836, 2304] 916->1708 918 adr <= 88.2 entropy = 0.726 samples = 307 value = [62, 245] 917->918 1013 customer_type_Transient <= 0.5 entropy = 0.976 samples = 1628 value = [963, 665] 917->1013 919 arrival_date_day_of_month <= 5.5 entropy = 0.42 samples = 129 value = [11, 118] 918->919 946 lead_time <= 1.5 entropy = 0.864 samples = 178 value = [51, 127] 918->946 920 arrival_date_week_number <= 16.5 entropy = 0.863 samples = 21 value = [6, 15] 919->920 931 adr <= 66.15 entropy = 0.27 samples = 108 value = [5, 103] 919->931 921 adr <= 79.3 entropy = 0.971 samples = 15 value = [6, 9] 920->921 930 entropy = 0.0 samples = 6 value = [0, 6] 920->930 922 assigned_room_type_A <= 0.5 entropy = 0.994 samples = 11 value = [6, 5] 921->922 929 entropy = 0.0 samples = 4 value = [0, 4] 921->929 923 lead_time <= 0.5 entropy = 0.722 samples = 5 value = [1, 4] 922->923 926 stays_in_week_nights <= 2.5 entropy = 0.65 samples = 6 value = [5, 1] 922->926 924 entropy = 1.0 samples = 2 value = [1, 1] 923->924 925 entropy = 0.0 samples = 3 value = [0, 3] 923->925 927 entropy = 0.0 samples = 5 value = [5, 0] 926->927 928 entropy = 0.0 samples = 1 value = [0, 1] 926->928 932 entropy = 0.0 samples = 54 value = [0, 54] 931->932 933 adr <= 79.1 entropy = 0.445 samples = 54 value = [5, 49] 931->933 934 adr <= 78.76 entropy = 0.637 samples = 31 value = [5, 26] 933->934 945 entropy = 0.0 samples = 23 value = [0, 23] 933->945 935 adr <= 73.075 entropy = 0.48 samples = 29 value = [3, 26] 934->935 944 entropy = 0.0 samples = 2 value = [2, 0] 934->944 936 stays_in_weekend_nights <= 1.5 entropy = 0.779 samples = 13 value = [3, 10] 935->936 943 entropy = 0.0 samples = 16 value = [0, 16] 935->943 937 arrival_date_week_number <= 30.0 entropy = 0.954 samples = 8 value = [3, 5] 936->937 942 entropy = 0.0 samples = 5 value = [0, 5] 936->942 938 distribution_channel_TA/TO <= 0.5 entropy = 0.65 samples = 6 value = [1, 5] 937->938 941 entropy = 0.0 samples = 2 value = [2, 0] 937->941 939 entropy = 0.0 samples = 1 value = [1, 0] 938->939 940 entropy = 0.0 samples = 5 value = [0, 5] 938->940 947 arrival_date_day_of_month <= 8.5 entropy = 0.612 samples = 73 value = [11, 62] 946->947 966 arrival_date_week_number <= 32.5 entropy = 0.959 samples = 105 value = [40, 65] 946->966 948 entropy = 0.0 samples = 21 value = [0, 21] 947->948 949 assigned_room_type_A <= 0.5 entropy = 0.744 samples = 52 value = [11, 41] 947->949 950 meal_HB <= 0.5 entropy = 0.235 samples = 26 value = [1, 25] 949->950 953 meal_BB <= 0.5 entropy = 0.961 samples = 26 value = [10, 16] 949->953 951 entropy = 0.0 samples = 24 value = [0, 24] 950->951 952 entropy = 1.0 samples = 2 value = [1, 1] 950->952 954 arrival_date_day_of_month <= 12.5 entropy = 0.439 samples = 11 value = [1, 10] 953->954 957 arrival_date_month_February <= 0.5 entropy = 0.971 samples = 15 value = [9, 6] 953->957 955 entropy = 1.0 samples = 2 value = [1, 1] 954->955 956 entropy = 0.0 samples = 9 value = [0, 9] 954->956 958 arrival_date_week_number <= 27.5 entropy = 0.89 samples = 13 value = [9, 4] 957->958 965 entropy = 0.0 samples = 2 value = [0, 2] 957->965 959 entropy = 0.0 samples = 6 value = [6, 0] 958->959 960 adr <= 184.0 entropy = 0.985 samples = 7 value = [3, 4] 958->960 961 stays_in_weekend_nights <= 0.5 entropy = 0.722 samples = 5 value = [1, 4] 960->961 964 entropy = 0.0 samples = 2 value = [2, 0] 960->964 962 entropy = 0.0 samples = 3 value = [0, 3] 961->962 963 entropy = 1.0 samples = 2 value = [1, 1] 961->963 967 arrival_date_year <= 2016.5 entropy = 0.999 samples = 69 value = [33, 36] 966->967 1002 arrival_date_week_number <= 36.5 entropy = 0.711 samples = 36 value = [7, 29] 966->1002 968 stays_in_week_nights <= 3.5 entropy = 0.947 samples = 41 value = [26, 15] 967->968 993 assigned_room_type_D <= 0.5 entropy = 0.811 samples = 28 value = [7, 21] 967->993 969 adults <= 1.5 entropy = 0.985 samples = 35 value = [20, 15] 968->969 992 entropy = 0.0 samples = 6 value = [6, 0] 968->992 970 lead_time <= 6.5 entropy = 0.65 samples = 6 value = [1, 5] 969->970 973 arrival_date_week_number <= 30.0 entropy = 0.929 samples = 29 value = [19, 10] 969->973 971 entropy = 0.0 samples = 4 value = [0, 4] 970->971 972 entropy = 1.0 samples = 2 value = [1, 1] 970->972 974 arrival_date_day_of_month <= 27.0 entropy = 0.98 samples = 24 value = [14, 10] 973->974 991 entropy = 0.0 samples = 5 value = [5, 0] 973->991 975 arrival_date_week_number <= 18.5 entropy = 0.946 samples = 22 value = [14, 8] 974->975 990 entropy = 0.0 samples = 2 value = [0, 2] 974->990 976 arrival_date_day_of_month <= 24.5 entropy = 0.684 samples = 11 value = [9, 2] 975->976 981 arrival_date_week_number <= 21.5 entropy = 0.994 samples = 11 value = [5, 6] 975->981 977 customer_type_Transient <= 0.5 entropy = 0.469 samples = 10 value = [9, 1] 976->977 980 entropy = 0.0 samples = 1 value = [0, 1] 976->980 978 entropy = 1.0 samples = 2 value = [1, 1] 977->978 979 entropy = 0.0 samples = 8 value = [8, 0] 977->979 982 entropy = 0.0 samples = 2 value = [0, 2] 981->982 983 arrival_date_day_of_month <= 13.0 entropy = 0.991 samples = 9 value = [5, 4] 981->983 984 arrival_date_month_July <= 0.5 entropy = 0.918 samples = 6 value = [2, 4] 983->984 989 entropy = 0.0 samples = 3 value = [3, 0] 983->989 985 arrival_date_day_of_month <= 3.5 entropy = 0.918 samples = 3 value = [2, 1] 984->985 988 entropy = 0.0 samples = 3 value = [0, 3] 984->988 986 entropy = 0.0 samples = 1 value = [0, 1] 985->986 987 entropy = 0.0 samples = 2 value = [2, 0] 985->987 994 arrival_date_day_of_month <= 17.5 entropy = 0.964 samples = 18 value = [7, 11] 993->994 1001 entropy = 0.0 samples = 10 value = [0, 10] 993->1001 995 stays_in_week_nights <= 0.5 entropy = 0.946 samples = 11 value = [7, 4] 994->995 1000 entropy = 0.0 samples = 7 value = [0, 7] 994->1000 996 entropy = 0.0 samples = 2 value = [0, 2] 995->996 997 adults <= 1.5 entropy = 0.764 samples = 9 value = [7, 2] 995->997 998 entropy = 0.0 samples = 2 value = [0, 2] 997->998 999 entropy = 0.0 samples = 7 value = [7, 0] 997->999 1003 entropy = 0.0 samples = 13 value = [0, 13] 1002->1003 1004 previous_cancellations <= 0.5 entropy = 0.887 samples = 23 value = [7, 16] 1002->1004 1005 stays_in_weekend_nights <= 0.5 entropy = 0.792 samples = 21 value = [5, 16] 1004->1005 1012 entropy = 0.0 samples = 2 value = [2, 0] 1004->1012 1006 entropy = 0.0 samples = 10 value = [0, 10] 1005->1006 1007 adr <= 138.5 entropy = 0.994 samples = 11 value = [5, 6] 1005->1007 1008 arrival_date_day_of_month <= 25.5 entropy = 0.592 samples = 7 value = [1, 6] 1007->1008 1011 entropy = 0.0 samples = 4 value = [4, 0] 1007->1011 1009 entropy = 0.0 samples = 5 value = [0, 5] 1008->1009 1010 entropy = 1.0 samples = 2 value = [1, 1] 1008->1010 1014 previous_cancellations <= 0.5 entropy = 0.667 samples = 109 value = [19, 90] 1013->1014 1045 lead_time <= 203.5 entropy = 0.957 samples = 1519 value = [944, 575] 1013->1045 1015 arrival_date_week_number <= 19.5 entropy = 0.526 samples = 101 value = [12, 89] 1014->1015 1042 arrival_date_month_August <= 0.5 entropy = 0.544 samples = 8 value = [7, 1] 1014->1042 1016 entropy = 0.0 samples = 32 value = [0, 32] 1015->1016 1017 lead_time <= 46.5 entropy = 0.667 samples = 69 value = [12, 57] 1015->1017 1018 entropy = 0.0 samples = 15 value = [0, 15] 1017->1018 1019 adr <= 86.69 entropy = 0.764 samples = 54 value = [12, 42] 1017->1019 1020 stays_in_week_nights <= 2.5 entropy = 0.991 samples = 9 value = [5, 4] 1019->1020 1025 arrival_date_day_of_month <= 23.5 entropy = 0.624 samples = 45 value = [7, 38] 1019->1025 1021 entropy = 0.0 samples = 3 value = [3, 0] 1020->1021 1022 booking_changes <= 1.5 entropy = 0.918 samples = 6 value = [2, 4] 1020->1022 1023 entropy = 0.0 samples = 4 value = [0, 4] 1022->1023 1024 entropy = 0.0 samples = 2 value = [2, 0] 1022->1024 1026 arrival_date_week_number <= 20.5 entropy = 0.414 samples = 36 value = [3, 33] 1025->1026 1035 arrival_date_day_of_month <= 26.0 entropy = 0.991 samples = 9 value = [4, 5] 1025->1035 1027 entropy = 0.0 samples = 1 value = [1, 0] 1026->1027 1028 stays_in_week_nights <= 1.5 entropy = 0.316 samples = 35 value = [2, 33] 1026->1028 1029 adr <= 118.35 entropy = 0.684 samples = 11 value = [2, 9] 1028->1029 1034 entropy = 0.0 samples = 24 value = [0, 24] 1028->1034 1030 entropy = 0.0 samples = 7 value = [0, 7] 1029->1030 1031 arrival_date_day_of_month <= 16.5 entropy = 1.0 samples = 4 value = [2, 2] 1029->1031 1032 entropy = 0.0 samples = 2 value = [2, 0] 1031->1032 1033 entropy = 0.0 samples = 2 value = [0, 2] 1031->1033 1036 entropy = 0.0 samples = 2 value = [2, 0] 1035->1036 1037 arrival_date_day_of_month <= 29.5 entropy = 0.863 samples = 7 value = [2, 5] 1035->1037 1038 entropy = 0.0 samples = 4 value = [0, 4] 1037->1038 1039 reserved_room_type_D <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 1037->1039 1040 entropy = 0.0 samples = 2 value = [2, 0] 1039->1040 1041 entropy = 0.0 samples = 1 value = [0, 1] 1039->1041 1043 entropy = 0.0 samples = 7 value = [7, 0] 1042->1043 1044 entropy = 0.0 samples = 1 value = [0, 1] 1042->1044 1046 booking_changes <= 0.5 entropy = 0.93 samples = 1366 value = [894, 472] 1045->1046 1631 country_PRT <= 0.5 entropy = 0.912 samples = 153 value = [50, 103] 1045->1631 1047 adr <= 137.705 entropy = 0.905 samples = 1241 value = [843, 398] 1046->1047 1568 adr <= 124.675 entropy = 0.975 samples = 125 value = [51, 74] 1046->1568 1048 adr <= 30.4 entropy = 0.948 samples = 857 value = [543, 314] 1047->1048 1423 arrival_date_month_September <= 0.5 entropy = 0.758 samples = 384 value = [300, 84] 1047->1423 1049 entropy = 0.0 samples = 9 value = [0, 9] 1048->1049 1050 country_PRT <= 0.5 entropy = 0.942 samples = 848 value = [543, 305] 1048->1050 1051 arrival_date_year <= 2015.5 entropy = 0.967 samples = 710 value = [431, 279] 1050->1051 1378 lead_time <= 49.5 entropy = 0.698 samples = 138 value = [112, 26] 1050->1378 1052 lead_time <= 12.5 entropy = 0.402 samples = 25 value = [2, 23] 1051->1052 1055 arrival_date_year <= 2016.5 entropy = 0.953 samples = 685 value = [429, 256] 1051->1055 1053 entropy = 0.0 samples = 2 value = [2, 0] 1052->1053 1054 entropy = 0.0 samples = 23 value = [0, 23] 1052->1054 1056 adr <= 73.485 entropy = 0.891 samples = 399 value = [276, 123] 1055->1056 1233 stays_in_week_nights <= 5.5 entropy = 0.996 samples = 286 value = [153, 133] 1055->1233 1057 arrival_date_week_number <= 39.0 entropy = 0.993 samples = 62 value = [28, 34] 1056->1057 1086 lead_time <= 67.5 entropy = 0.833 samples = 337 value = [248, 89] 1056->1086 1058 arrival_date_month_February <= 0.5 entropy = 0.824 samples = 31 value = [8, 23] 1057->1058 1069 lead_time <= 85.0 entropy = 0.938 samples = 31 value = [20, 11] 1057->1069 1059 lead_time <= 65.5 entropy = 0.605 samples = 27 value = [4, 23] 1058->1059 1068 entropy = 0.0 samples = 4 value = [4, 0] 1058->1068 1060 arrival_date_day_of_month <= 24.5 entropy = 0.863 samples = 14 value = [4, 10] 1059->1060 1067 entropy = 0.0 samples = 13 value = [0, 13] 1059->1067 1061 lead_time <= 58.5 entropy = 0.65 samples = 12 value = [2, 10] 1060->1061 1066 entropy = 0.0 samples = 2 value = [2, 0] 1060->1066 1062 entropy = 0.0 samples = 8 value = [0, 8] 1061->1062 1063 adr <= 65.625 entropy = 1.0 samples = 4 value = [2, 2] 1061->1063 1064 entropy = 0.0 samples = 2 value = [2, 0] 1063->1064 1065 entropy = 0.0 samples = 2 value = [0, 2] 1063->1065 1070 stays_in_week_nights <= 4.0 entropy = 0.971 samples = 15 value = [6, 9] 1069->1070 1081 hotel_City Hotel <= 0.5 entropy = 0.544 samples = 16 value = [14, 2] 1069->1081 1071 arrival_date_day_of_month <= 6.5 entropy = 0.994 samples = 11 value = [6, 5] 1070->1071 1080 entropy = 0.0 samples = 4 value = [0, 4] 1070->1080 1072 entropy = 0.0 samples = 3 value = [0, 3] 1071->1072 1073 stays_in_week_nights <= 1.5 entropy = 0.811 samples = 8 value = [6, 2] 1071->1073 1074 country_listed_other <= 0.5 entropy = 1.0 samples = 4 value = [2, 2] 1073->1074 1079 entropy = 0.0 samples = 4 value = [4, 0] 1073->1079 1075 entropy = 0.0 samples = 1 value = [0, 1] 1074->1075 1076 arrival_date_week_number <= 52.5 entropy = 0.918 samples = 3 value = [2, 1] 1074->1076 1077 entropy = 0.0 samples = 2 value = [2, 0] 1076->1077 1078 entropy = 0.0 samples = 1 value = [0, 1] 1076->1078 1082 reserved_room_type_E <= 0.5 entropy = 0.371 samples = 14 value = [13, 1] 1081->1082 1085 entropy = 1.0 samples = 2 value = [1, 1] 1081->1085 1083 entropy = 0.0 samples = 12 value = [12, 0] 1082->1083 1084 entropy = 1.0 samples = 2 value = [1, 1] 1082->1084 1087 assigned_room_type_E <= 0.5 entropy = 0.936 samples = 159 value = [103, 56] 1086->1087 1172 meal_BB <= 0.5 entropy = 0.692 samples = 178 value = [145, 33] 1086->1172 1088 arrival_date_week_number <= 30.5 entropy = 0.916 samples = 154 value = [103, 51] 1087->1088 1171 entropy = 0.0 samples = 5 value = [0, 5] 1087->1171 1089 assigned_room_type_A <= 0.5 entropy = 0.827 samples = 100 value = [74, 26] 1088->1089 1136 stays_in_week_nights <= 1.5 entropy = 0.996 samples = 54 value = [29, 25] 1088->1136 1090 reserved_room_type_A <= 0.5 entropy = 0.971 samples = 25 value = [15, 10] 1089->1090 1101 stays_in_week_nights <= 3.5 entropy = 0.748 samples = 75 value = [59, 16] 1089->1101 1091 arrival_date_week_number <= 10.5 entropy = 0.811 samples = 20 value = [15, 5] 1090->1091 1100 entropy = 0.0 samples = 5 value = [0, 5] 1090->1100 1092 entropy = 0.0 samples = 2 value = [0, 2] 1091->1092 1093 arrival_date_day_of_month <= 16.5 entropy = 0.65 samples = 18 value = [15, 3] 1091->1093 1094 entropy = 0.0 samples = 9 value = [9, 0] 1093->1094 1095 lead_time <= 45.5 entropy = 0.918 samples = 9 value = [6, 3] 1093->1095 1096 stays_in_week_nights <= 1.5 entropy = 0.811 samples = 4 value = [1, 3] 1095->1096 1099 entropy = 0.0 samples = 5 value = [5, 0] 1095->1099 1097 entropy = 0.0 samples = 1 value = [1, 0] 1096->1097 1098 entropy = 0.0 samples = 3 value = [0, 3] 1096->1098 1102 arrival_date_day_of_month <= 9.0 entropy = 0.799 samples = 66 value = [50, 16] 1101->1102 1135 entropy = 0.0 samples = 9 value = [9, 0] 1101->1135 1103 arrival_date_week_number <= 10.5 entropy = 0.353 samples = 15 value = [14, 1] 1102->1103 1108 arrival_date_day_of_month <= 10.5 entropy = 0.874 samples = 51 value = [36, 15] 1102->1108 1104 stays_in_weekend_nights <= 1.5 entropy = 0.918 samples = 3 value = [2, 1] 1103->1104 1107 entropy = 0.0 samples = 12 value = [12, 0] 1103->1107 1105 entropy = 0.0 samples = 1 value = [0, 1] 1104->1105 1106 entropy = 0.0 samples = 2 value = [2, 0] 1104->1106 1109 entropy = 0.0 samples = 2 value = [0, 2] 1108->1109 1110 lead_time <= 13.5 entropy = 0.835 samples = 49 value = [36, 13] 1108->1110 1111 adr <= 91.0 entropy = 0.985 samples = 7 value = [3, 4] 1110->1111 1116 lead_time <= 56.5 entropy = 0.75 samples = 42 value = [33, 9] 1110->1116 1112 entropy = 0.0 samples = 2 value = [2, 0] 1111->1112 1113 stays_in_week_nights <= 2.5 entropy = 0.722 samples = 5 value = [1, 4] 1111->1113 1114 entropy = 0.0 samples = 4 value = [0, 4] 1113->1114 1115 entropy = 0.0 samples = 1 value = [1, 0] 1113->1115 1117 country_DEU <= 0.5 entropy = 0.555 samples = 31 value = [27, 4] 1116->1117 1130 arrival_date_week_number <= 16.5 entropy = 0.994 samples = 11 value = [6, 5] 1116->1130 1118 arrival_date_day_of_month <= 14.0 entropy = 0.469 samples = 30 value = [27, 3] 1117->1118 1129 entropy = 0.0 samples = 1 value = [0, 1] 1117->1129 1119 entropy = 0.0 samples = 12 value = [12, 0] 1118->1119 1120 arrival_date_week_number <= 28.0 entropy = 0.65 samples = 18 value = [15, 3] 1118->1120 1121 lead_time <= 29.0 entropy = 0.523 samples = 17 value = [15, 2] 1120->1121 1128 entropy = 0.0 samples = 1 value = [0, 1] 1120->1128 1122 lead_time <= 22.5 entropy = 0.863 samples = 7 value = [5, 2] 1121->1122 1127 entropy = 0.0 samples = 10 value = [10, 0] 1121->1127 1123 arrival_date_month_February <= 0.5 entropy = 0.65 samples = 6 value = [5, 1] 1122->1123 1126 entropy = 0.0 samples = 1 value = [0, 1] 1122->1126 1124 entropy = 0.0 samples = 5 value = [5, 0] 1123->1124 1125 entropy = 0.0 samples = 1 value = [0, 1] 1123->1125 1131 adr <= 77.615 entropy = 0.65 samples = 6 value = [1, 5] 1130->1131 1134 entropy = 0.0 samples = 5 value = [5, 0] 1130->1134 1132 entropy = 0.0 samples = 1 value = [1, 0] 1131->1132 1133 entropy = 0.0 samples = 5 value = [0, 5] 1131->1133 1137 adr <= 123.275 entropy = 0.863 samples = 21 value = [6, 15] 1136->1137 1146 arrival_date_week_number <= 46.5 entropy = 0.885 samples = 33 value = [23, 10] 1136->1146 1138 arrival_date_month_December <= 0.5 entropy = 0.985 samples = 14 value = [6, 8] 1137->1138 1145 entropy = 0.0 samples = 7 value = [0, 7] 1137->1145 1139 country_listed_other <= 0.5 entropy = 0.971 samples = 10 value = [6, 4] 1138->1139 1144 entropy = 0.0 samples = 4 value = [0, 4] 1138->1144 1140 entropy = 0.0 samples = 2 value = [0, 2] 1139->1140 1141 arrival_date_week_number <= 35.0 entropy = 0.811 samples = 8 value = [6, 2] 1139->1141 1142 entropy = 0.0 samples = 2 value = [0, 2] 1141->1142 1143 entropy = 0.0 samples = 6 value = [6, 0] 1141->1143 1147 adr <= 129.23 entropy = 1.0 samples = 14 value = [7, 7] 1146->1147 1158 arrival_date_day_of_month <= 24.0 entropy = 0.629 samples = 19 value = [16, 3] 1146->1158 1148 assigned_room_type_A <= 0.5 entropy = 0.98 samples = 12 value = [5, 7] 1147->1148 1157 entropy = 0.0 samples = 2 value = [2, 0] 1147->1157 1149 entropy = 0.0 samples = 2 value = [0, 2] 1148->1149 1150 arrival_date_day_of_month <= 30.5 entropy = 1.0 samples = 10 value = [5, 5] 1148->1150 1151 arrival_date_day_of_month <= 21.5 entropy = 0.954 samples = 8 value = [5, 3] 1150->1151 1156 entropy = 0.0 samples = 2 value = [0, 2] 1150->1156 1152 lead_time <= 51.0 entropy = 0.811 samples = 4 value = [1, 3] 1151->1152 1155 entropy = 0.0 samples = 4 value = [4, 0] 1151->1155 1153 entropy = 0.0 samples = 3 value = [0, 3] 1152->1153 1154 entropy = 0.0 samples = 1 value = [1, 0] 1152->1154 1159 arrival_date_day_of_month <= 21.5 entropy = 0.75 samples = 14 value = [11, 3] 1158->1159 1170 entropy = 0.0 samples = 5 value = [5, 0] 1158->1170 1160 meal_BB <= 0.5 entropy = 0.619 samples = 13 value = [11, 2] 1159->1160 1169 entropy = 0.0 samples = 1 value = [0, 1] 1159->1169 1161 stays_in_weekend_nights <= 0.5 entropy = 0.811 samples = 8 value = [6, 2] 1160->1161 1168 entropy = 0.0 samples = 5 value = [5, 0] 1160->1168 1162 adr <= 90.335 entropy = 1.0 samples = 4 value = [2, 2] 1161->1162 1167 entropy = 0.0 samples = 4 value = [4, 0] 1161->1167 1163 country_FRA <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 1162->1163 1166 entropy = 0.0 samples = 1 value = [0, 1] 1162->1166 1164 entropy = 0.0 samples = 2 value = [2, 0] 1163->1164 1165 entropy = 0.0 samples = 1 value = [0, 1] 1163->1165 1173 arrival_date_month_July <= 0.5 entropy = 0.391 samples = 52 value = [48, 4] 1172->1173 1184 arrival_date_day_of_month <= 2.5 entropy = 0.778 samples = 126 value = [97, 29] 1172->1184 1174 children <= 0.5 entropy = 0.156 samples = 44 value = [43, 1] 1173->1174 1177 lead_time <= 111.0 entropy = 0.954 samples = 8 value = [5, 3] 1173->1177 1175 entropy = 0.0 samples = 43 value = [43, 0] 1174->1175 1176 entropy = 0.0 samples = 1 value = [0, 1] 1174->1176 1178 entropy = 0.0 samples = 2 value = [2, 0] 1177->1178 1179 arrival_date_day_of_month <= 15.0 entropy = 1.0 samples = 6 value = [3, 3] 1177->1179 1180 arrival_date_day_of_month <= 3.5 entropy = 0.811 samples = 4 value = [1, 3] 1179->1180 1183 entropy = 0.0 samples = 2 value = [2, 0] 1179->1183 1181 entropy = 0.0 samples = 1 value = [1, 0] 1180->1181 1182 entropy = 0.0 samples = 3 value = [0, 3] 1180->1182 1185 entropy = 0.0 samples = 11 value = [11, 0] 1184->1185 1186 lead_time <= 77.5 entropy = 0.815 samples = 115 value = [86, 29] 1184->1186 1187 entropy = 0.0 samples = 10 value = [10, 0] 1186->1187 1188 lead_time <= 160.5 entropy = 0.85 samples = 105 value = [76, 29] 1186->1188 1189 lead_time <= 149.5 entropy = 0.906 samples = 84 value = [57, 27] 1188->1189 1228 adr <= 129.795 entropy = 0.454 samples = 21 value = [19, 2] 1188->1228 1190 lead_time <= 141.5 entropy = 0.863 samples = 77 value = [55, 22] 1189->1190 1223 arrival_date_month_June <= 0.5 entropy = 0.863 samples = 7 value = [2, 5] 1189->1223 1191 arrival_date_day_of_month <= 12.5 entropy = 0.908 samples = 68 value = [46, 22] 1190->1191 1222 entropy = 0.0 samples = 9 value = [9, 0] 1190->1222 1192 assigned_room_type_A <= 0.5 entropy = 0.998 samples = 21 value = [11, 10] 1191->1192 1203 arrival_date_week_number <= 35.5 entropy = 0.82 samples = 47 value = [35, 12] 1191->1203 1193 arrival_date_month_December <= 0.5 entropy = 0.592 samples = 7 value = [1, 6] 1192->1193 1196 lead_time <= 122.5 entropy = 0.863 samples = 14 value = [10, 4] 1192->1196 1194 entropy = 0.0 samples = 5 value = [0, 5] 1193->1194 1195 entropy = 1.0 samples = 2 value = [1, 1] 1193->1195 1197 adr <= 119.775 entropy = 0.65 samples = 12 value = [10, 2] 1196->1197 1202 entropy = 0.0 samples = 2 value = [0, 2] 1196->1202 1198 entropy = 0.0 samples = 8 value = [8, 0] 1197->1198 1199 lead_time <= 91.0 entropy = 1.0 samples = 4 value = [2, 2] 1197->1199 1200 entropy = 0.0 samples = 2 value = [0, 2] 1199->1200 1201 entropy = 0.0 samples = 2 value = [2, 0] 1199->1201 1204 adr <= 92.015 entropy = 0.909 samples = 37 value = [25, 12] 1203->1204 1221 entropy = 0.0 samples = 10 value = [10, 0] 1203->1221 1205 arrival_date_day_of_month <= 24.5 entropy = 0.863 samples = 7 value = [2, 5] 1204->1205 1210 stays_in_week_nights <= 2.5 entropy = 0.784 samples = 30 value = [23, 7] 1204->1210 1206 assigned_room_type_D <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 1205->1206 1209 entropy = 0.0 samples = 4 value = [0, 4] 1205->1209 1207 entropy = 0.0 samples = 2 value = [2, 0] 1206->1207 1208 entropy = 0.0 samples = 1 value = [0, 1] 1206->1208 1211 adr <= 131.55 entropy = 0.937 samples = 17 value = [11, 6] 1210->1211 1218 adr <= 96.69 entropy = 0.391 samples = 13 value = [12, 1] 1210->1218 1212 arrival_date_month_August <= 0.5 entropy = 0.75 samples = 14 value = [11, 3] 1211->1212 1217 entropy = 0.0 samples = 3 value = [0, 3] 1211->1217 1213 lead_time <= 126.5 entropy = 0.414 samples = 12 value = [11, 1] 1212->1213 1216 entropy = 0.0 samples = 2 value = [0, 2] 1212->1216 1214 entropy = 0.0 samples = 10 value = [10, 0] 1213->1214 1215 entropy = 1.0 samples = 2 value = [1, 1] 1213->1215 1219 entropy = 0.0 samples = 1 value = [0, 1] 1218->1219 1220 entropy = 0.0 samples = 12 value = [12, 0] 1218->1220 1224 adr <= 117.75 entropy = 0.65 samples = 6 value = [1, 5] 1223->1224 1227 entropy = 0.0 samples = 1 value = [1, 0] 1223->1227 1225 entropy = 0.0 samples = 5 value = [0, 5] 1224->1225 1226 entropy = 0.0 samples = 1 value = [1, 0] 1224->1226 1229 adr <= 89.445 entropy = 0.286 samples = 20 value = [19, 1] 1228->1229 1232 entropy = 0.0 samples = 1 value = [0, 1] 1228->1232 1230 entropy = 1.0 samples = 2 value = [1, 1] 1229->1230 1231 entropy = 0.0 samples = 18 value = [18, 0] 1229->1231 1234 lead_time <= 12.5 entropy = 0.999 samples = 278 value = [145, 133] 1233->1234 1377 entropy = 0.0 samples = 8 value = [8, 0] 1233->1377 1235 entropy = 0.0 samples = 8 value = [0, 8] 1234->1235 1236 lead_time <= 185.0 entropy = 0.996 samples = 270 value = [145, 125] 1234->1236 1237 arrival_date_week_number <= 3.5 entropy = 0.998 samples = 264 value = [139, 125] 1236->1237 1376 entropy = 0.0 samples = 6 value = [6, 0] 1236->1376 1238 stays_in_week_nights <= 1.5 entropy = 0.619 samples = 13 value = [11, 2] 1237->1238 1243 arrival_date_day_of_month <= 28.5 entropy = 1.0 samples = 251 value = [128, 123] 1237->1243 1239 arrival_date_week_number <= 2.5 entropy = 1.0 samples = 4 value = [2, 2] 1238->1239 1242 entropy = 0.0 samples = 9 value = [9, 0] 1238->1242 1240 entropy = 0.0 samples = 2 value = [0, 2] 1239->1240 1241 entropy = 0.0 samples = 2 value = [2, 0] 1239->1241 1244 adr <= 135.215 entropy = 1.0 samples = 232 value = [113, 119] 1243->1244 1365 lead_time <= 91.5 entropy = 0.742 samples = 19 value = [15, 4] 1243->1365 1245 adr <= 113.25 entropy = 1.0 samples = 229 value = [113, 116] 1244->1245 1364 entropy = 0.0 samples = 3 value = [0, 3] 1244->1364 1246 arrival_date_week_number <= 29.5 entropy = 0.987 samples = 134 value = [58, 76] 1245->1246 1315 children <= 0.5 entropy = 0.982 samples = 95 value = [55, 40] 1245->1315 1247 lead_time <= 173.5 entropy = 0.995 samples = 127 value = [58, 69] 1246->1247 1314 entropy = 0.0 samples = 7 value = [0, 7] 1246->1314 1248 lead_time <= 158.0 entropy = 0.988 samples = 122 value = [53, 69] 1247->1248 1313 entropy = 0.0 samples = 5 value = [5, 0] 1247->1313 1249 assigned_room_type_G <= 0.5 entropy = 0.994 samples = 117 value = [53, 64] 1248->1249 1312 entropy = 0.0 samples = 5 value = [0, 5] 1248->1312 1250 arrival_date_day_of_month <= 16.5 entropy = 0.989 samples = 114 value = [50, 64] 1249->1250 1311 entropy = 0.0 samples = 3 value = [3, 0] 1249->1311 1251 lead_time <= 134.5 entropy = 0.938 samples = 62 value = [22, 40] 1250->1251 1282 stays_in_weekend_nights <= 1.5 entropy = 0.996 samples = 52 value = [28, 24] 1250->1282 1252 lead_time <= 106.5 entropy = 0.877 samples = 54 value = [16, 38] 1251->1252 1279 arrival_date_week_number <= 28.0 entropy = 0.811 samples = 8 value = [6, 2] 1251->1279 1253 arrival_date_week_number <= 19.5 entropy = 0.925 samples = 47 value = [16, 31] 1252->1253 1278 entropy = 0.0 samples = 7 value = [0, 7] 1252->1278 1254 arrival_date_month_May <= 0.5 entropy = 0.952 samples = 43 value = [16, 27] 1253->1254 1277 entropy = 0.0 samples = 4 value = [0, 4] 1253->1277 1255 arrival_date_day_of_month <= 2.5 entropy = 0.926 samples = 41 value = [14, 27] 1254->1255 1276 entropy = 0.0 samples = 2 value = [2, 0] 1254->1276 1256 entropy = 0.0 samples = 6 value = [0, 6] 1255->1256 1257 reserved_room_type_E <= 0.5 entropy = 0.971 samples = 35 value = [14, 21] 1255->1257 1258 lead_time <= 59.5 entropy = 0.946 samples = 33 value = [12, 21] 1257->1258 1275 entropy = 0.0 samples = 2 value = [2, 0] 1257->1275 1259 adr <= 75.735 entropy = 0.999 samples = 23 value = [11, 12] 1258->1259 1272 arrival_date_week_number <= 7.5 entropy = 0.469 samples = 10 value = [1, 9] 1258->1272 1260 entropy = 0.0 samples = 4 value = [4, 0] 1259->1260 1261 lead_time <= 45.5 entropy = 0.949 samples = 19 value = [7, 12] 1259->1261 1262 arrival_date_week_number <= 9.5 entropy = 0.619 samples = 13 value = [2, 11] 1261->1262 1269 lead_time <= 58.0 entropy = 0.65 samples = 6 value = [5, 1] 1261->1269 1263 arrival_date_week_number <= 6.5 entropy = 0.971 samples = 5 value = [2, 3] 1262->1263 1268 entropy = 0.0 samples = 8 value = [0, 8] 1262->1268 1264 entropy = 0.0 samples = 2 value = [0, 2] 1263->1264 1265 country_DEU <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 1263->1265 1266 entropy = 0.0 samples = 2 value = [2, 0] 1265->1266 1267 entropy = 0.0 samples = 1 value = [0, 1] 1265->1267 1270 entropy = 0.0 samples = 5 value = [5, 0] 1269->1270 1271 entropy = 0.0 samples = 1 value = [0, 1] 1269->1271 1273 entropy = 0.0 samples = 1 value = [1, 0] 1272->1273 1274 entropy = 0.0 samples = 9 value = [0, 9] 1272->1274 1280 entropy = 0.0 samples = 6 value = [6, 0] 1279->1280 1281 entropy = 0.0 samples = 2 value = [0, 2] 1279->1281 1283 adr <= 67.44 entropy = 0.937 samples = 34 value = [22, 12] 1282->1283 1304 hotel_City Hotel <= 0.5 entropy = 0.918 samples = 18 value = [6, 12] 1282->1304 1284 meal_BB <= 0.5 entropy = 0.722 samples = 5 value = [1, 4] 1283->1284 1287 distribution_channel_TA/TO <= 0.5 entropy = 0.85 samples = 29 value = [21, 8] 1283->1287 1285 entropy = 1.0 samples = 2 value = [1, 1] 1284->1285 1286 entropy = 0.0 samples = 3 value = [0, 3] 1284->1286 1288 entropy = 0.0 samples = 2 value = [0, 2] 1287->1288 1289 arrival_date_month_March <= 0.5 entropy = 0.764 samples = 27 value = [21, 6] 1287->1289 1290 country_listed_other <= 0.5 entropy = 0.881 samples = 20 value = [14, 6] 1289->1290 1303 entropy = 0.0 samples = 7 value = [7, 0] 1289->1303 1291 entropy = 0.0 samples = 5 value = [5, 0] 1290->1291 1292 lead_time <= 71.0 entropy = 0.971 samples = 15 value = [9, 6] 1290->1292 1293 lead_time <= 27.5 entropy = 0.764 samples = 9 value = [7, 2] 1292->1293 1298 stays_in_week_nights <= 2.5 entropy = 0.918 samples = 6 value = [2, 4] 1292->1298 1294 stays_in_week_nights <= 1.5 entropy = 1.0 samples = 4 value = [2, 2] 1293->1294 1297 entropy = 0.0 samples = 5 value = [5, 0] 1293->1297 1295 entropy = 0.0 samples = 2 value = [0, 2] 1294->1295 1296 entropy = 0.0 samples = 2 value = [2, 0] 1294->1296 1299 entropy = 0.0 samples = 3 value = [0, 3] 1298->1299 1300 assigned_room_type_D <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 1298->1300 1301 entropy = 0.0 samples = 2 value = [2, 0] 1300->1301 1302 entropy = 0.0 samples = 1 value = [0, 1] 1300->1302 1305 entropy = 0.0 samples = 3 value = [3, 0] 1304->1305 1306 arrival_date_month_June <= 0.5 entropy = 0.722 samples = 15 value = [3, 12] 1304->1306 1307 entropy = 0.0 samples = 11 value = [0, 11] 1306->1307 1308 lead_time <= 140.5 entropy = 0.811 samples = 4 value = [3, 1] 1306->1308 1309 entropy = 0.0 samples = 3 value = [3, 0] 1308->1309 1310 entropy = 0.0 samples = 1 value = [0, 1] 1308->1310 1316 adults <= 1.5 entropy = 0.968 samples = 91 value = [55, 36] 1315->1316 1363 entropy = 0.0 samples = 4 value = [0, 4] 1315->1363 1317 entropy = 0.0 samples = 6 value = [6, 0] 1316->1317 1318 arrival_date_day_of_month <= 24.5 entropy = 0.983 samples = 85 value = [49, 36] 1316->1318 1319 country_listed_other <= 0.5 entropy = 0.959 samples = 76 value = [47, 29] 1318->1319 1360 adr <= 121.5 entropy = 0.764 samples = 9 value = [2, 7] 1318->1360 1320 adr <= 122.4 entropy = 0.94 samples = 14 value = [5, 9] 1319->1320 1329 arrival_date_week_number <= 21.5 entropy = 0.907 samples = 62 value = [42, 20] 1319->1329 1321 entropy = 0.0 samples = 4 value = [0, 4] 1320->1321 1322 arrival_date_month_May <= 0.5 entropy = 1.0 samples = 10 value = [5, 5] 1320->1322 1323 country_FRA <= 0.5 entropy = 0.954 samples = 8 value = [3, 5] 1322->1323 1328 entropy = 0.0 samples = 2 value = [2, 0] 1322->1328 1324 adr <= 128.25 entropy = 0.971 samples = 5 value = [3, 2] 1323->1324 1327 entropy = 0.0 samples = 3 value = [0, 3] 1323->1327 1325 entropy = 0.0 samples = 2 value = [0, 2] 1324->1325 1326 entropy = 0.0 samples = 3 value = [3, 0] 1324->1326 1330 lead_time <= 103.5 entropy = 0.981 samples = 43 value = [25, 18] 1329->1330 1353 lead_time <= 178.0 entropy = 0.485 samples = 19 value = [17, 2] 1329->1353 1331 lead_time <= 66.0 entropy = 0.999 samples = 33 value = [17, 16] 1330->1331 1348 lead_time <= 136.0 entropy = 0.722 samples = 10 value = [8, 2] 1330->1348 1332 adr <= 131.05 entropy = 0.837 samples = 15 value = [11, 4] 1331->1332 1339 arrival_date_day_of_month <= 17.5 entropy = 0.918 samples = 18 value = [6, 12] 1331->1339 1333 arrival_date_week_number <= 10.5 entropy = 0.619 samples = 13 value = [11, 2] 1332->1333 1338 entropy = 0.0 samples = 2 value = [0, 2] 1332->1338 1334 lead_time <= 32.0 entropy = 0.918 samples = 3 value = [1, 2] 1333->1334 1337 entropy = 0.0 samples = 10 value = [10, 0] 1333->1337 1335 entropy = 0.0 samples = 1 value = [1, 0] 1334->1335 1336 entropy = 0.0 samples = 2 value = [0, 2] 1334->1336 1340 stays_in_week_nights <= 4.5 entropy = 0.722 samples = 15 value = [3, 12] 1339->1340 1347 entropy = 0.0 samples = 3 value = [3, 0] 1339->1347 1341 adr <= 132.75 entropy = 0.414 samples = 12 value = [1, 11] 1340->1341 1344 lead_time <= 79.5 entropy = 0.918 samples = 3 value = [2, 1] 1340->1344 1342 entropy = 0.0 samples = 10 value = [0, 10] 1341->1342 1343 entropy = 1.0 samples = 2 value = [1, 1] 1341->1343 1345 entropy = 0.0 samples = 1 value = [0, 1] 1344->1345 1346 entropy = 0.0 samples = 2 value = [2, 0] 1344->1346 1349 entropy = 0.0 samples = 7 value = [7, 0] 1348->1349 1350 arrival_date_month_April <= 0.5 entropy = 0.918 samples = 3 value = [1, 2] 1348->1350 1351 entropy = 0.0 samples = 2 value = [0, 2] 1350->1351 1352 entropy = 0.0 samples = 1 value = [1, 0] 1350->1352 1354 arrival_date_day_of_month <= 20.5 entropy = 0.31 samples = 18 value = [17, 1] 1353->1354 1359 entropy = 0.0 samples = 1 value = [0, 1] 1353->1359 1355 entropy = 0.0 samples = 15 value = [15, 0] 1354->1355 1356 lead_time <= 108.5 entropy = 0.918 samples = 3 value = [2, 1] 1354->1356 1357 entropy = 0.0 samples = 2 value = [2, 0] 1356->1357 1358 entropy = 0.0 samples = 1 value = [0, 1] 1356->1358 1361 entropy = 0.0 samples = 2 value = [2, 0] 1360->1361 1362 entropy = 0.0 samples = 7 value = [0, 7] 1360->1362 1366 arrival_date_month_January <= 0.5 entropy = 0.918 samples = 12 value = [8, 4] 1365->1366 1375 entropy = 0.0 samples = 7 value = [7, 0] 1365->1375 1367 meal_SC <= 0.5 entropy = 0.991 samples = 9 value = [5, 4] 1366->1367 1374 entropy = 0.0 samples = 3 value = [3, 0] 1366->1374 1368 assigned_room_type_C <= 0.5 entropy = 0.722 samples = 5 value = [4, 1] 1367->1368 1371 country_listed_other <= 0.5 entropy = 0.811 samples = 4 value = [1, 3] 1367->1371 1369 entropy = 0.0 samples = 4 value = [4, 0] 1368->1369 1370 entropy = 0.0 samples = 1 value = [0, 1] 1368->1370 1372 entropy = 0.0 samples = 1 value = [1, 0] 1371->1372 1373 entropy = 0.0 samples = 3 value = [0, 3] 1371->1373 1379 arrival_date_day_of_month <= 17.0 entropy = 0.855 samples = 68 value = [49, 19] 1378->1379 1406 arrival_date_day_of_month <= 16.5 entropy = 0.469 samples = 70 value = [63, 7] 1378->1406 1380 assigned_room_type_E <= 0.5 entropy = 0.555 samples = 31 value = [27, 4] 1379->1380 1391 lead_time <= 45.5 entropy = 0.974 samples = 37 value = [22, 15] 1379->1391 1381 lead_time <= 19.0 entropy = 0.469 samples = 30 value = [27, 3] 1380->1381 1390 entropy = 0.0 samples = 1 value = [0, 1] 1380->1390 1382 arrival_date_week_number <= 20.0 entropy = 0.881 samples = 10 value = [7, 3] 1381->1382 1389 entropy = 0.0 samples = 20 value = [20, 0] 1381->1389 1383 hotel_Resort Hotel <= 0.5 entropy = 1.0 samples = 6 value = [3, 3] 1382->1383 1388 entropy = 0.0 samples = 4 value = [4, 0] 1382->1388 1384 stays_in_weekend_nights <= 1.0 entropy = 0.811 samples = 4 value = [3, 1] 1383->1384 1387 entropy = 0.0 samples = 2 value = [0, 2] 1383->1387 1385 entropy = 0.0 samples = 3 value = [3, 0] 1384->1385 1386 entropy = 0.0 samples = 1 value = [0, 1] 1384->1386 1392 assigned_room_type_D <= 0.5 entropy = 0.918 samples = 33 value = [22, 11] 1391->1392 1405 entropy = 0.0 samples = 4 value = [0, 4] 1391->1405 1393 lead_time <= 33.5 entropy = 0.722 samples = 25 value = [20, 5] 1392->1393 1400 arrival_date_week_number <= 15.0 entropy = 0.811 samples = 8 value = [2, 6] 1392->1400 1394 lead_time <= 28.5 entropy = 0.918 samples = 15 value = [10, 5] 1393->1394 1399 entropy = 0.0 samples = 10 value = [10, 0] 1393->1399 1395 assigned_room_type_H <= 0.5 entropy = 0.439 samples = 11 value = [10, 1] 1394->1395 1398 entropy = 0.0 samples = 4 value = [0, 4] 1394->1398 1396 entropy = 0.0 samples = 10 value = [10, 0] 1395->1396 1397 entropy = 0.0 samples = 1 value = [0, 1] 1395->1397 1401 arrival_date_month_January <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 1400->1401 1404 entropy = 0.0 samples = 5 value = [0, 5] 1400->1404 1402 entropy = 0.0 samples = 2 value = [2, 0] 1401->1402 1403 entropy = 0.0 samples = 1 value = [0, 1] 1401->1403 1407 arrival_date_day_of_month <= 11.5 entropy = 0.679 samples = 39 value = [32, 7] 1406->1407 1422 entropy = 0.0 samples = 31 value = [31, 0] 1406->1422 1408 arrival_date_month_July <= 0.5 entropy = 0.381 samples = 27 value = [25, 2] 1407->1408 1417 arrival_date_week_number <= 19.5 entropy = 0.98 samples = 12 value = [7, 5] 1407->1417 1409 entropy = 0.0 samples = 18 value = [18, 0] 1408->1409 1410 arrival_date_day_of_month <= 6.5 entropy = 0.764 samples = 9 value = [7, 2] 1408->1410 1411 adr <= 71.315 entropy = 0.918 samples = 6 value = [4, 2] 1410->1411 1416 entropy = 0.0 samples = 3 value = [3, 0] 1410->1416 1412 entropy = 0.0 samples = 1 value = [0, 1] 1411->1412 1413 stays_in_week_nights <= 4.0 entropy = 0.722 samples = 5 value = [4, 1] 1411->1413 1414 entropy = 0.0 samples = 3 value = [3, 0] 1413->1414 1415 entropy = 1.0 samples = 2 value = [1, 1] 1413->1415 1418 entropy = 0.0 samples = 6 value = [6, 0] 1417->1418 1419 adults <= 1.5 entropy = 0.65 samples = 6 value = [1, 5] 1417->1419 1420 entropy = 1.0 samples = 2 value = [1, 1] 1419->1420 1421 entropy = 0.0 samples = 4 value = [0, 4] 1419->1421 1424 lead_time <= 50.0 entropy = 0.785 samples = 359 value = [275, 84] 1423->1424 1567 entropy = 0.0 samples = 25 value = [25, 0] 1423->1567 1425 meal_BB <= 0.5 entropy = 0.923 samples = 127 value = [84, 43] 1424->1425 1484 hotel_Resort Hotel <= 0.5 entropy = 0.673 samples = 232 value = [191, 41] 1424->1484 1426 stays_in_week_nights <= 3.5 entropy = 0.976 samples = 22 value = [9, 13] 1425->1426 1437 adr <= 144.45 entropy = 0.863 samples = 105 value = [75, 30] 1425->1437 1427 adr <= 166.375 entropy = 0.998 samples = 17 value = [9, 8] 1426->1427 1436 entropy = 0.0 samples = 5 value = [0, 5] 1426->1436 1428 country_listed_other <= 0.5 entropy = 0.961 samples = 13 value = [5, 8] 1427->1428 1435 entropy = 0.0 samples = 4 value = [4, 0] 1427->1435 1429 adr <= 139.5 entropy = 0.722 samples = 5 value = [4, 1] 1428->1429 1432 lead_time <= 38.5 entropy = 0.544 samples = 8 value = [1, 7] 1428->1432 1430 entropy = 0.0 samples = 1 value = [0, 1] 1429->1430 1431 entropy = 0.0 samples = 4 value = [4, 0] 1429->1431 1433 entropy = 0.0 samples = 6 value = [0, 6] 1432->1433 1434 entropy = 1.0 samples = 2 value = [1, 1] 1432->1434 1438 entropy = 0.0 samples = 13 value = [13, 0] 1437->1438 1439 hotel_Resort Hotel <= 0.5 entropy = 0.911 samples = 92 value = [62, 30] 1437->1439 1440 arrival_date_month_July <= 0.5 entropy = 0.988 samples = 55 value = [31, 24] 1439->1440 1473 lead_time <= 15.0 entropy = 0.639 samples = 37 value = [31, 6] 1439->1473 1441 arrival_date_week_number <= 34.5 entropy = 0.963 samples = 49 value = [30, 19] 1440->1441 1470 arrival_date_day_of_month <= 1.5 entropy = 0.65 samples = 6 value = [1, 5] 1440->1470 1442 arrival_date_month_February <= 0.5 entropy = 0.874 samples = 34 value = [24, 10] 1441->1442 1463 children <= 0.5 entropy = 0.971 samples = 15 value = [6, 9] 1441->1463 1443 arrival_date_year <= 2016.5 entropy = 0.784 samples = 30 value = [23, 7] 1442->1443 1460 arrival_date_day_of_month <= 22.0 entropy = 0.811 samples = 4 value = [1, 3] 1442->1460 1444 arrival_date_day_of_month <= 27.5 entropy = 0.391 samples = 13 value = [12, 1] 1443->1444 1447 arrival_date_day_of_month <= 21.0 entropy = 0.937 samples = 17 value = [11, 6] 1443->1447 1445 entropy = 0.0 samples = 12 value = [12, 0] 1444->1445 1446 entropy = 0.0 samples = 1 value = [0, 1] 1444->1446 1448 arrival_date_day_of_month <= 17.0 entropy = 0.985 samples = 14 value = [8, 6] 1447->1448 1459 entropy = 0.0 samples = 3 value = [3, 0] 1447->1459 1449 arrival_date_week_number <= 11.5 entropy = 0.918 samples = 12 value = [8, 4] 1448->1449 1458 entropy = 0.0 samples = 2 value = [0, 2] 1448->1458 1450 entropy = 0.0 samples = 3 value = [3, 0] 1449->1450 1451 lead_time <= 29.0 entropy = 0.991 samples = 9 value = [5, 4] 1449->1451 1452 children <= 1.0 entropy = 0.722 samples = 5 value = [4, 1] 1451->1452 1455 children <= 1.0 entropy = 0.811 samples = 4 value = [1, 3] 1451->1455 1453 entropy = 0.0 samples = 4 value = [4, 0] 1452->1453 1454 entropy = 0.0 samples = 1 value = [0, 1] 1452->1454 1456 entropy = 0.0 samples = 3 value = [0, 3] 1455->1456 1457 entropy = 0.0 samples = 1 value = [1, 0] 1455->1457 1461 entropy = 1.0 samples = 2 value = [1, 1] 1460->1461 1462 entropy = 0.0 samples = 2 value = [0, 2] 1460->1462 1464 assigned_room_type_D <= 0.5 entropy = 0.89 samples = 13 value = [4, 9] 1463->1464 1469 entropy = 0.0 samples = 2 value = [2, 0] 1463->1469 1465 entropy = 0.0 samples = 6 value = [0, 6] 1464->1465 1466 stays_in_weekend_nights <= 0.5 entropy = 0.985 samples = 7 value = [4, 3] 1464->1466 1467 entropy = 0.0 samples = 3 value = [0, 3] 1466->1467 1468 entropy = 0.0 samples = 4 value = [4, 0] 1466->1468 1471 entropy = 0.0 samples = 1 value = [1, 0] 1470->1471 1472 entropy = 0.0 samples = 5 value = [0, 5] 1470->1472 1474 entropy = 0.0 samples = 2 value = [0, 2] 1473->1474 1475 arrival_date_year <= 2015.5 entropy = 0.513 samples = 35 value = [31, 4] 1473->1475 1476 adr <= 193.375 entropy = 0.971 samples = 5 value = [2, 3] 1475->1476 1479 reserved_room_type_E <= 0.5 entropy = 0.211 samples = 30 value = [29, 1] 1475->1479 1477 entropy = 0.0 samples = 3 value = [0, 3] 1476->1477 1478 entropy = 0.0 samples = 2 value = [2, 0] 1476->1478 1480 entropy = 0.0 samples = 25 value = [25, 0] 1479->1480 1481 adr <= 194.1 entropy = 0.722 samples = 5 value = [4, 1] 1479->1481 1482 entropy = 1.0 samples = 2 value = [1, 1] 1481->1482 1483 entropy = 0.0 samples = 3 value = [3, 0] 1481->1483 1485 arrival_date_day_of_month <= 21.5 entropy = 0.767 samples = 152 value = [118, 34] 1484->1485 1548 arrival_date_year <= 2016.5 entropy = 0.428 samples = 80 value = [73, 7] 1484->1548 1486 lead_time <= 87.5 entropy = 0.852 samples = 108 value = [78, 30] 1485->1486 1539 adr <= 298.3 entropy = 0.439 samples = 44 value = [40, 4] 1485->1539 1487 arrival_date_day_of_month <= 18.5 entropy = 0.619 samples = 39 value = [33, 6] 1486->1487 1506 lead_time <= 110.5 entropy = 0.932 samples = 69 value = [45, 24] 1486->1506 1488 arrival_date_day_of_month <= 8.5 entropy = 0.503 samples = 36 value = [32, 4] 1487->1488 1503 arrival_date_month_May <= 0.5 entropy = 0.918 samples = 3 value = [1, 2] 1487->1503 1489 arrival_date_month_May <= 0.5 entropy = 0.702 samples = 21 value = [17, 4] 1488->1489 1502 entropy = 0.0 samples = 15 value = [15, 0] 1488->1502 1490 stays_in_week_nights <= 3.5 entropy = 0.863 samples = 14 value = [10, 4] 1489->1490 1501 entropy = 0.0 samples = 7 value = [7, 0] 1489->1501 1491 adr <= 144.9 entropy = 0.971 samples = 10 value = [6, 4] 1490->1491 1500 entropy = 0.0 samples = 4 value = [4, 0] 1490->1500 1492 entropy = 0.0 samples = 2 value = [2, 0] 1491->1492 1493 arrival_date_day_of_month <= 3.5 entropy = 1.0 samples = 8 value = [4, 4] 1491->1493 1494 assigned_room_type_A <= 0.5 entropy = 0.811 samples = 4 value = [3, 1] 1493->1494 1497 arrival_date_week_number <= 23.5 entropy = 0.811 samples = 4 value = [1, 3] 1493->1497 1495 entropy = 0.0 samples = 3 value = [3, 0] 1494->1495 1496 entropy = 0.0 samples = 1 value = [0, 1] 1494->1496 1498 entropy = 0.0 samples = 3 value = [0, 3] 1497->1498 1499 entropy = 0.0 samples = 1 value = [1, 0] 1497->1499 1504 entropy = 0.0 samples = 1 value = [1, 0] 1503->1504 1505 entropy = 0.0 samples = 2 value = [0, 2] 1503->1505 1507 stays_in_week_nights <= 1.5 entropy = 0.982 samples = 19 value = [8, 11] 1506->1507 1518 adr <= 152.55 entropy = 0.827 samples = 50 value = [37, 13] 1506->1518 1508 entropy = 0.0 samples = 5 value = [0, 5] 1507->1508 1509 lead_time <= 100.5 entropy = 0.985 samples = 14 value = [8, 6] 1507->1509 1510 country_DEU <= 0.5 entropy = 0.592 samples = 7 value = [6, 1] 1509->1510 1513 adr <= 147.0 entropy = 0.863 samples = 7 value = [2, 5] 1509->1513 1511 entropy = 0.0 samples = 6 value = [6, 0] 1510->1511 1512 entropy = 0.0 samples = 1 value = [0, 1] 1510->1512 1514 arrival_date_day_of_month <= 15.0 entropy = 0.918 samples = 3 value = [2, 1] 1513->1514 1517 entropy = 0.0 samples = 4 value = [0, 4] 1513->1517 1515 entropy = 0.0 samples = 2 value = [2, 0] 1514->1515 1516 entropy = 0.0 samples = 1 value = [0, 1] 1514->1516 1519 stays_in_week_nights <= 2.5 entropy = 0.966 samples = 23 value = [14, 9] 1518->1519 1530 arrival_date_day_of_month <= 10.5 entropy = 0.605 samples = 27 value = [23, 4] 1518->1530 1520 arrival_date_month_May <= 0.5 entropy = 0.985 samples = 14 value = [6, 8] 1519->1520 1527 country_DEU <= 0.5 entropy = 0.503 samples = 9 value = [8, 1] 1519->1527 1521 lead_time <= 135.5 entropy = 0.845 samples = 11 value = [3, 8] 1520->1521 1526 entropy = 0.0 samples = 3 value = [3, 0] 1520->1526 1522 entropy = 0.0 samples = 2 value = [2, 0] 1521->1522 1523 lead_time <= 172.5 entropy = 0.503 samples = 9 value = [1, 8] 1521->1523 1524 entropy = 0.0 samples = 8 value = [0, 8] 1523->1524 1525 entropy = 0.0 samples = 1 value = [1, 0] 1523->1525 1528 entropy = 0.0 samples = 8 value = [8, 0] 1527->1528 1529 entropy = 0.0 samples = 1 value = [0, 1] 1527->1529 1531 adr <= 183.6 entropy = 0.918 samples = 12 value = [8, 4] 1530->1531 1538 entropy = 0.0 samples = 15 value = [15, 0] 1530->1538 1532 lead_time <= 168.5 entropy = 0.544 samples = 8 value = [7, 1] 1531->1532 1535 lead_time <= 132.0 entropy = 0.811 samples = 4 value = [1, 3] 1531->1535 1533 entropy = 0.0 samples = 7 value = [7, 0] 1532->1533 1534 entropy = 0.0 samples = 1 value = [0, 1] 1532->1534 1536 entropy = 0.0 samples = 3 value = [0, 3] 1535->1536 1537 entropy = 0.0 samples = 1 value = [1, 0] 1535->1537 1540 lead_time <= 129.0 entropy = 0.365 samples = 43 value = [40, 3] 1539->1540 1547 entropy = 0.0 samples = 1 value = [0, 1] 1539->1547 1541 entropy = 0.0 samples = 31 value = [31, 0] 1540->1541 1542 lead_time <= 148.5 entropy = 0.811 samples = 12 value = [9, 3] 1540->1542 1543 reserved_room_type_A <= 0.5 entropy = 0.971 samples = 5 value = [2, 3] 1542->1543 1546 entropy = 0.0 samples = 7 value = [7, 0] 1542->1546 1544 entropy = 0.0 samples = 3 value = [0, 3] 1543->1544 1545 entropy = 0.0 samples = 2 value = [2, 0] 1543->1545 1549 adr <= 215.665 entropy = 0.632 samples = 44 value = [37, 7] 1548->1549 1566 entropy = 0.0 samples = 36 value = [36, 0] 1548->1566 1550 reserved_room_type_G <= 0.5 entropy = 0.758 samples = 32 value = [25, 7] 1549->1550 1565 entropy = 0.0 samples = 12 value = [12, 0] 1549->1565 1551 lead_time <= 102.5 entropy = 0.544 samples = 24 value = [21, 3] 1550->1551 1560 lead_time <= 87.0 entropy = 1.0 samples = 8 value = [4, 4] 1550->1560 1552 lead_time <= 85.5 entropy = 0.811 samples = 12 value = [9, 3] 1551->1552 1559 entropy = 0.0 samples = 12 value = [12, 0] 1551->1559 1553 lead_time <= 60.5 entropy = 0.469 samples = 10 value = [9, 1] 1552->1553 1558 entropy = 0.0 samples = 2 value = [0, 2] 1552->1558 1554 arrival_date_year <= 2015.5 entropy = 0.918 samples = 3 value = [2, 1] 1553->1554 1557 entropy = 0.0 samples = 7 value = [7, 0] 1553->1557 1555 entropy = 0.0 samples = 1 value = [0, 1] 1554->1555 1556 entropy = 0.0 samples = 2 value = [2, 0] 1554->1556 1561 entropy = 0.0 samples = 3 value = [3, 0] 1560->1561 1562 arrival_date_week_number <= 26.0 entropy = 0.722 samples = 5 value = [1, 4] 1560->1562 1563 entropy = 0.0 samples = 1 value = [1, 0] 1562->1563 1564 entropy = 0.0 samples = 4 value = [0, 4] 1562->1564 1569 assigned_room_type_D <= 0.5 entropy = 0.851 samples = 65 value = [18, 47] 1568->1569 1598 adr <= 227.75 entropy = 0.993 samples = 60 value = [33, 27] 1568->1598 1570 arrival_date_day_of_month <= 20.5 entropy = 0.729 samples = 54 value = [11, 43] 1569->1570 1591 country_listed_other <= 0.5 entropy = 0.946 samples = 11 value = [7, 4] 1569->1591 1571 arrival_date_week_number <= 13.0 entropy = 0.896 samples = 32 value = [10, 22] 1570->1571 1588 adr <= 46.715 entropy = 0.267 samples = 22 value = [1, 21] 1570->1588 1572 adr <= 72.68 entropy = 0.954 samples = 8 value = [5, 3] 1571->1572 1579 country_PRT <= 0.5 entropy = 0.738 samples = 24 value = [5, 19] 1571->1579 1573 entropy = 0.0 samples = 3 value = [3, 0] 1572->1573 1574 arrival_date_day_of_month <= 9.5 entropy = 0.971 samples = 5 value = [2, 3] 1572->1574 1575 entropy = 0.0 samples = 2 value = [0, 2] 1574->1575 1576 lead_time <= 13.5 entropy = 0.918 samples = 3 value = [2, 1] 1574->1576 1577 entropy = 0.0 samples = 1 value = [0, 1] 1576->1577 1578 entropy = 0.0 samples = 2 value = [2, 0] 1576->1578 1580 lead_time <= 182.5 entropy = 0.323 samples = 17 value = [1, 16] 1579->1580 1583 adr <= 93.87 entropy = 0.985 samples = 7 value = [4, 3] 1579->1583 1581 entropy = 0.0 samples = 15 value = [0, 15] 1580->1581 1582 entropy = 1.0 samples = 2 value = [1, 1] 1580->1582 1584 entropy = 0.0 samples = 2 value = [0, 2] 1583->1584 1585 meal_SC <= 0.5 entropy = 0.722 samples = 5 value = [4, 1] 1583->1585 1586 entropy = 0.0 samples = 4 value = [4, 0] 1585->1586 1587 entropy = 0.0 samples = 1 value = [0, 1] 1585->1587 1589 entropy = 1.0 samples = 2 value = [1, 1] 1588->1589 1590 entropy = 0.0 samples = 20 value = [0, 20] 1588->1590 1592 entropy = 0.0 samples = 5 value = [5, 0] 1591->1592 1593 adr <= 100.58 entropy = 0.918 samples = 6 value = [2, 4] 1591->1593 1594 entropy = 0.0 samples = 3 value = [0, 3] 1593->1594 1595 adr <= 121.815 entropy = 0.918 samples = 3 value = [2, 1] 1593->1595 1596 entropy = 0.0 samples = 2 value = [2, 0] 1595->1596 1597 entropy = 0.0 samples = 1 value = [0, 1] 1595->1597 1599 arrival_date_month_May <= 0.5 entropy = 0.971 samples = 55 value = [33, 22] 1598->1599 1630 entropy = 0.0 samples = 5 value = [0, 5] 1598->1630 1600 lead_time <= 154.5 entropy = 0.911 samples = 46 value = [31, 15] 1599->1600 1625 stays_in_week_nights <= 2.5 entropy = 0.764 samples = 9 value = [2, 7] 1599->1625 1601 lead_time <= 137.0 entropy = 0.849 samples = 40 value = [29, 11] 1600->1601 1620 booking_changes <= 1.5 entropy = 0.918 samples = 6 value = [2, 4] 1600->1620 1602 lead_time <= 124.0 entropy = 0.918 samples = 33 value = [22, 11] 1601->1602 1619 entropy = 0.0 samples = 7 value = [7, 0] 1601->1619 1603 arrival_date_week_number <= 34.0 entropy = 0.811 samples = 28 value = [21, 7] 1602->1603 1616 arrival_date_week_number <= 15.5 entropy = 0.722 samples = 5 value = [1, 4] 1602->1616 1604 arrival_date_week_number <= 24.5 entropy = 0.575 samples = 22 value = [19, 3] 1603->1604 1611 stays_in_week_nights <= 4.5 entropy = 0.918 samples = 6 value = [2, 4] 1603->1611 1605 stays_in_weekend_nights <= 0.5 entropy = 0.918 samples = 9 value = [6, 3] 1604->1605 1610 entropy = 0.0 samples = 13 value = [13, 0] 1604->1610 1606 entropy = 0.0 samples = 2 value = [0, 2] 1605->1606 1607 stays_in_week_nights <= 0.5 entropy = 0.592 samples = 7 value = [6, 1] 1605->1607 1608 entropy = 0.0 samples = 1 value = [0, 1] 1607->1608 1609 entropy = 0.0 samples = 6 value = [6, 0] 1607->1609 1612 entropy = 0.0 samples = 3 value = [0, 3] 1611->1612 1613 adr <= 209.06 entropy = 0.918 samples = 3 value = [2, 1] 1611->1613 1614 entropy = 0.0 samples = 2 value = [2, 0] 1613->1614 1615 entropy = 0.0 samples = 1 value = [0, 1] 1613->1615 1617 entropy = 0.0 samples = 1 value = [1, 0] 1616->1617 1618 entropy = 0.0 samples = 4 value = [0, 4] 1616->1618 1621 entropy = 0.0 samples = 3 value = [0, 3] 1620->1621 1622 hotel_City Hotel <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 1620->1622 1623 entropy = 0.0 samples = 1 value = [0, 1] 1622->1623 1624 entropy = 0.0 samples = 2 value = [2, 0] 1622->1624 1626 booking_changes <= 1.5 entropy = 1.0 samples = 4 value = [2, 2] 1625->1626 1629 entropy = 0.0 samples = 5 value = [0, 5] 1625->1629 1627 entropy = 0.0 samples = 2 value = [0, 2] 1626->1627 1628 entropy = 0.0 samples = 2 value = [2, 0] 1626->1628 1632 stays_in_week_nights <= 3.5 entropy = 0.863 samples = 133 value = [38, 95] 1631->1632 1695 arrival_date_month_August <= 0.5 entropy = 0.971 samples = 20 value = [12, 8] 1631->1695 1633 arrival_date_week_number <= 30.5 entropy = 0.95 samples = 84 value = [31, 53] 1632->1633 1676 adr <= 67.935 entropy = 0.592 samples = 49 value = [7, 42] 1632->1676 1634 lead_time <= 206.5 entropy = 0.998 samples = 38 value = [20, 18] 1633->1634 1659 stays_in_weekend_nights <= 0.5 entropy = 0.794 samples = 46 value = [11, 35] 1633->1659 1635 entropy = 0.0 samples = 2 value = [0, 2] 1634->1635 1636 arrival_date_day_of_month <= 1.5 entropy = 0.991 samples = 36 value = [20, 16] 1634->1636 1637 entropy = 0.0 samples = 3 value = [3, 0] 1636->1637 1638 assigned_room_type_D <= 0.5 entropy = 0.999 samples = 33 value = [17, 16] 1636->1638 1639 adr <= 77.285 entropy = 0.987 samples = 30 value = [17, 13] 1638->1639 1658 entropy = 0.0 samples = 3 value = [0, 3] 1638->1658 1640 entropy = 0.0 samples = 2 value = [0, 2] 1639->1640 1641 arrival_date_week_number <= 8.5 entropy = 0.967 samples = 28 value = [17, 11] 1639->1641 1642 entropy = 0.0 samples = 3 value = [3, 0] 1641->1642 1643 adr <= 103.05 entropy = 0.99 samples = 25 value = [14, 11] 1641->1643 1644 arrival_date_day_of_month <= 25.0 entropy = 0.961 samples = 13 value = [5, 8] 1643->1644 1653 arrival_date_day_of_month <= 14.5 entropy = 0.811 samples = 12 value = [9, 3] 1643->1653 1645 meal_BB <= 0.5 entropy = 1.0 samples = 10 value = [5, 5] 1644->1645 1652 entropy = 0.0 samples = 3 value = [0, 3] 1644->1652 1646 stays_in_week_nights <= 2.5 entropy = 0.722 samples = 5 value = [4, 1] 1645->1646 1649 country_listed_other <= 0.5 entropy = 0.722 samples = 5 value = [1, 4] 1645->1649 1647 entropy = 0.0 samples = 3 value = [3, 0] 1646->1647 1648 entropy = 1.0 samples = 2 value = [1, 1] 1646->1648 1650 entropy = 0.0 samples = 1 value = [1, 0] 1649->1650 1651 entropy = 0.0 samples = 4 value = [0, 4] 1649->1651 1654 country_listed_other <= 0.5 entropy = 0.971 samples = 5 value = [2, 3] 1653->1654 1657 entropy = 0.0 samples = 7 value = [7, 0] 1653->1657 1655 entropy = 0.0 samples = 2 value = [2, 0] 1654->1655 1656 entropy = 0.0 samples = 3 value = [0, 3] 1654->1656 1660 arrival_date_day_of_month <= 18.0 entropy = 1.0 samples = 12 value = [6, 6] 1659->1660 1665 lead_time <= 205.5 entropy = 0.602 samples = 34 value = [5, 29] 1659->1665 1661 arrival_date_month_September <= 0.5 entropy = 0.592 samples = 7 value = [1, 6] 1660->1661 1664 entropy = 0.0 samples = 5 value = [5, 0] 1660->1664 1662 entropy = 0.0 samples = 6 value = [0, 6] 1661->1662 1663 entropy = 0.0 samples = 1 value = [1, 0] 1661->1663 1666 entropy = 0.0 samples = 1 value = [1, 0] 1665->1666 1667 adr <= 74.05 entropy = 0.533 samples = 33 value = [4, 29] 1665->1667 1668 entropy = 0.0 samples = 1 value = [1, 0] 1667->1668 1669 country_listed_other <= 0.5 entropy = 0.449 samples = 32 value = [3, 29] 1667->1669 1670 adults <= 2.5 entropy = 0.845 samples = 11 value = [3, 8] 1669->1670 1675 entropy = 0.0 samples = 21 value = [0, 21] 1669->1675 1671 arrival_date_week_number <= 43.5 entropy = 0.503 samples = 9 value = [1, 8] 1670->1671 1674 entropy = 0.0 samples = 2 value = [2, 0] 1670->1674 1672 entropy = 0.0 samples = 7 value = [0, 7] 1671->1672 1673 entropy = 1.0 samples = 2 value = [1, 1] 1671->1673 1677 lead_time <= 226.5 entropy = 1.0 samples = 8 value = [4, 4] 1676->1677 1682 reserved_room_type_F <= 0.5 entropy = 0.378 samples = 41 value = [3, 38] 1676->1682 1678 entropy = 0.0 samples = 3 value = [3, 0] 1677->1678 1679 stays_in_week_nights <= 6.5 entropy = 0.722 samples = 5 value = [1, 4] 1677->1679 1680 entropy = 0.0 samples = 4 value = [0, 4] 1679->1680 1681 entropy = 0.0 samples = 1 value = [1, 0] 1679->1681 1683 arrival_date_month_August <= 0.5 entropy = 0.179 samples = 37 value = [1, 36] 1682->1683 1690 lead_time <= 214.5 entropy = 1.0 samples = 4 value = [2, 2] 1682->1690 1684 entropy = 0.0 samples = 29 value = [0, 29] 1683->1684 1685 adr <= 102.825 entropy = 0.544 samples = 8 value = [1, 7] 1683->1685 1686 entropy = 0.0 samples = 4 value = [0, 4] 1685->1686 1687 meal_SC <= 0.5 entropy = 0.811 samples = 4 value = [1, 3] 1685->1687 1688 entropy = 0.0 samples = 3 value = [0, 3] 1687->1688 1689 entropy = 0.0 samples = 1 value = [1, 0] 1687->1689 1691 entropy = 0.0 samples = 1 value = [0, 1] 1690->1691 1692 country_listed_other <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 1690->1692 1693 entropy = 0.0 samples = 1 value = [0, 1] 1692->1693 1694 entropy = 0.0 samples = 2 value = [2, 0] 1692->1694 1696 arrival_date_day_of_month <= 16.5 entropy = 0.779 samples = 13 value = [10, 3] 1695->1696 1703 adr <= 87.65 entropy = 0.863 samples = 7 value = [2, 5] 1695->1703 1697 assigned_room_type_D <= 0.5 entropy = 0.985 samples = 7 value = [4, 3] 1696->1697 1702 entropy = 0.0 samples = 6 value = [6, 0] 1696->1702 1698 arrival_date_year <= 2015.5 entropy = 0.722 samples = 5 value = [4, 1] 1697->1698 1701 entropy = 0.0 samples = 2 value = [0, 2] 1697->1701 1699 entropy = 0.0 samples = 1 value = [0, 1] 1698->1699 1700 entropy = 0.0 samples = 4 value = [4, 0] 1698->1700 1704 stays_in_week_nights <= 5.0 entropy = 0.918 samples = 3 value = [2, 1] 1703->1704 1707 entropy = 0.0 samples = 4 value = [0, 4] 1703->1707 1705 entropy = 0.0 samples = 1 value = [0, 1] 1704->1705 1706 entropy = 0.0 samples = 2 value = [2, 0] 1704->1706 1709 country_PRT <= 0.5 entropy = 0.433 samples = 495 value = [44, 451] 1708->1709 1808 country_PRT <= 0.5 entropy = 0.881 samples = 2645 value = [792, 1853] 1708->1808 1710 arrival_date_week_number <= 30.5 entropy = 0.289 samples = 316 value = [16, 300] 1709->1710 1755 lead_time <= 1.5 entropy = 0.626 samples = 179 value = [28, 151] 1709->1755 1711 arrival_date_week_number <= 29.5 entropy = 0.405 samples = 173 value = [14, 159] 1710->1711 1746 assigned_room_type_B <= 0.5 entropy = 0.106 samples = 143 value = [2, 141] 1710->1746 1712 lead_time <= 4.5 entropy = 0.368 samples = 170 value = [12, 158] 1711->1712 1743 stays_in_week_nights <= 2.5 entropy = 0.918 samples = 3 value = [2, 1] 1711->1743 1713 arrival_date_week_number <= 2.5 entropy = 0.098 samples = 79 value = [1, 78] 1712->1713 1718 assigned_room_type_D <= 0.5 entropy = 0.532 samples = 91 value = [11, 80] 1712->1718 1714 assigned_room_type_D <= 0.5 entropy = 0.503 samples = 9 value = [1, 8] 1713->1714 1717 entropy = 0.0 samples = 70 value = [0, 70] 1713->1717 1715 entropy = 0.0 samples = 7 value = [0, 7] 1714->1715 1716 entropy = 1.0 samples = 2 value = [1, 1] 1714->1716 1719 arrival_date_month_May <= 0.5 entropy = 0.708 samples = 57 value = [11, 46] 1718->1719 1742 entropy = 0.0 samples = 34 value = [0, 34] 1718->1742 1720 adr <= 108.4 entropy = 0.802 samples = 45 value = [11, 34] 1719->1720 1741 entropy = 0.0 samples = 12 value = [0, 12] 1719->1741 1721 lead_time <= 7.5 entropy = 0.439 samples = 22 value = [2, 20] 1720->1721 1728 arrival_date_week_number <= 17.5 entropy = 0.966 samples = 23 value = [9, 14] 1720->1728 1722 entropy = 0.0 samples = 16 value = [0, 16] 1721->1722 1723 meal_SC <= 0.5 entropy = 0.918 samples = 6 value = [2, 4] 1721->1723 1724 entropy = 0.0 samples = 3 value = [0, 3] 1723->1724 1725 stays_in_week_nights <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 1723->1725 1726 entropy = 0.0 samples = 1 value = [0, 1] 1725->1726 1727 entropy = 0.0 samples = 2 value = [2, 0] 1725->1727 1729 entropy = 0.0 samples = 6 value = [6, 0] 1728->1729 1730 arrival_date_day_of_month <= 14.0 entropy = 0.672 samples = 17 value = [3, 14] 1728->1730 1731 reserved_room_type_A <= 0.5 entropy = 0.954 samples = 8 value = [3, 5] 1730->1731 1740 entropy = 0.0 samples = 9 value = [0, 9] 1730->1740 1732 entropy = 0.0 samples = 2 value = [0, 2] 1731->1732 1733 stays_in_week_nights <= 1.5 entropy = 1.0 samples = 6 value = [3, 3] 1731->1733 1734 adr <= 135.0 entropy = 0.971 samples = 5 value = [2, 3] 1733->1734 1739 entropy = 0.0 samples = 1 value = [1, 0] 1733->1739 1735 arrival_date_week_number <= 27.5 entropy = 0.918 samples = 3 value = [2, 1] 1734->1735 1738 entropy = 0.0 samples = 2 value = [0, 2] 1734->1738 1736 entropy = 0.0 samples = 2 value = [2, 0] 1735->1736 1737 entropy = 0.0 samples = 1 value = [0, 1] 1735->1737 1744 entropy = 0.0 samples = 2 value = [2, 0] 1743->1744 1745 entropy = 0.0 samples = 1 value = [0, 1] 1743->1745 1747 arrival_date_day_of_month <= 4.5 entropy = 0.061 samples = 141 value = [1, 140] 1746->1747 1754 entropy = 1.0 samples = 2 value = [1, 1] 1746->1754 1748 adr <= 167.16 entropy = 0.337 samples = 16 value = [1, 15] 1747->1748 1753 entropy = 0.0 samples = 125 value = [0, 125] 1747->1753 1749 entropy = 0.0 samples = 13 value = [0, 13] 1748->1749 1750 arrival_date_year <= 2016.5 entropy = 0.918 samples = 3 value = [1, 2] 1748->1750 1751 entropy = 0.0 samples = 1 value = [1, 0] 1750->1751 1752 entropy = 0.0 samples = 2 value = [0, 2] 1750->1752 1756 stays_in_weekend_nights <= 1.5 entropy = 0.129 samples = 56 value = [1, 55] 1755->1756 1761 adr <= 75.25 entropy = 0.759 samples = 123 value = [27, 96] 1755->1761 1757 entropy = 0.0 samples = 53 value = [0, 53] 1756->1757 1758 adr <= 135.69 entropy = 0.918 samples = 3 value = [1, 2] 1756->1758 1759 entropy = 0.0 samples = 2 value = [0, 2] 1758->1759 1760 entropy = 0.0 samples = 1 value = [1, 0] 1758->1760 1762 arrival_date_day_of_month <= 8.0 entropy = 0.316 samples = 35 value = [2, 33] 1761->1762 1767 arrival_date_month_April <= 0.5 entropy = 0.861 samples = 88 value = [25, 63] 1761->1767 1763 total_of_special_requests <= 1.5 entropy = 0.918 samples = 6 value = [2, 4] 1762->1763 1766 entropy = 0.0 samples = 29 value = [0, 29] 1762->1766 1764 entropy = 0.0 samples = 4 value = [0, 4] 1763->1764 1765 entropy = 0.0 samples = 2 value = [2, 0] 1763->1765 1768 assigned_room_type_F <= 0.5 entropy = 0.896 samples = 80 value = [25, 55] 1767->1768 1807 entropy = 0.0 samples = 8 value = [0, 8] 1767->1807 1769 arrival_date_month_June <= 0.5 entropy = 0.918 samples = 75 value = [25, 50] 1768->1769 1806 entropy = 0.0 samples = 5 value = [0, 5] 1768->1806 1770 children <= 1.5 entropy = 0.94 samples = 70 value = [25, 45] 1769->1770 1805 entropy = 0.0 samples = 5 value = [0, 5] 1769->1805 1771 lead_time <= 3.5 entropy = 0.923 samples = 68 value = [23, 45] 1770->1771 1804 entropy = 0.0 samples = 2 value = [2, 0] 1770->1804 1772 arrival_date_week_number <= 27.5 entropy = 0.684 samples = 22 value = [4, 18] 1771->1772 1779 meal_SC <= 0.5 entropy = 0.978 samples = 46 value = [19, 27] 1771->1779 1773 lead_time <= 2.5 entropy = 0.946 samples = 11 value = [4, 7] 1772->1773 1778 entropy = 0.0 samples = 11 value = [0, 11] 1772->1778 1774 adr <= 88.5 entropy = 0.918 samples = 6 value = [4, 2] 1773->1774 1777 entropy = 0.0 samples = 5 value = [0, 5] 1773->1777 1775 entropy = 0.0 samples = 2 value = [0, 2] 1774->1775 1776 entropy = 0.0 samples = 4 value = [4, 0] 1774->1776 1780 adr <= 156.5 entropy = 0.998 samples = 40 value = [19, 21] 1779->1780 1803 entropy = 0.0 samples = 6 value = [0, 6] 1779->1803 1781 babies <= 0.5 entropy = 0.978 samples = 29 value = [17, 12] 1780->1781 1798 agent_unknown <= 0.5 entropy = 0.684 samples = 11 value = [2, 9] 1780->1798 1782 stays_in_week_nights <= 0.5 entropy = 0.951 samples = 27 value = [17, 10] 1781->1782 1797 entropy = 0.0 samples = 2 value = [0, 2] 1781->1797 1783 entropy = 0.0 samples = 2 value = [0, 2] 1782->1783 1784 adr <= 117.0 entropy = 0.904 samples = 25 value = [17, 8] 1782->1784 1785 arrival_date_day_of_month <= 22.5 entropy = 1.0 samples = 16 value = [8, 8] 1784->1785 1796 entropy = 0.0 samples = 9 value = [9, 0] 1784->1796 1786 lead_time <= 7.0 entropy = 0.985 samples = 14 value = [8, 6] 1785->1786 1795 entropy = 0.0 samples = 2 value = [0, 2] 1785->1795 1787 stays_in_week_nights <= 1.5 entropy = 0.918 samples = 9 value = [3, 6] 1786->1787 1794 entropy = 0.0 samples = 5 value = [5, 0] 1786->1794 1788 entropy = 0.0 samples = 4 value = [0, 4] 1787->1788 1789 arrival_date_year <= 2015.5 entropy = 0.971 samples = 5 value = [3, 2] 1787->1789 1790 assigned_room_type_D <= 0.5 entropy = 0.918 samples = 3 value = [1, 2] 1789->1790 1793 entropy = 0.0 samples = 2 value = [2, 0] 1789->1793 1791 entropy = 0.0 samples = 2 value = [0, 2] 1790->1791 1792 entropy = 0.0 samples = 1 value = [1, 0] 1790->1792 1799 arrival_date_month_May <= 0.5 entropy = 0.469 samples = 10 value = [1, 9] 1798->1799 1802 entropy = 0.0 samples = 1 value = [1, 0] 1798->1802 1800 entropy = 0.0 samples = 8 value = [0, 8] 1799->1800 1801 entropy = 1.0 samples = 2 value = [1, 1] 1799->1801 1809 arrival_date_year <= 2015.5 entropy = 0.823 samples = 2166 value = [558, 1608] 1808->1809 2658 previous_cancellations <= 0.5 entropy = 1.0 samples = 479 value = [234, 245] 1808->2658 1810 entropy = 0.0 samples = 149 value = [0, 149] 1809->1810 1811 customer_type_Transient <= 0.5 entropy = 0.851 samples = 2017 value = [558, 1459] 1809->1811 1812 stays_in_week_nights <= 4.5 entropy = 0.157 samples = 131 value = [3, 128] 1811->1812 1823 country_listed_other <= 0.5 entropy = 0.874 samples = 1886 value = [555, 1331] 1811->1823 1813 arrival_date_month_January <= 0.5 entropy = 0.073 samples = 114 value = [1, 113] 1812->1813 1818 arrival_date_day_of_month <= 29.5 entropy = 0.523 samples = 17 value = [2, 15] 1812->1818 1814 entropy = 0.0 samples = 108 value = [0, 108] 1813->1814 1815 adults <= 1.5 entropy = 0.65 samples = 6 value = [1, 5] 1813->1815 1816 entropy = 0.0 samples = 1 value = [1, 0] 1815->1816 1817 entropy = 0.0 samples = 5 value = [0, 5] 1815->1817 1819 hotel_City Hotel <= 0.5 entropy = 0.337 samples = 16 value = [1, 15] 1818->1819 1822 entropy = 0.0 samples = 1 value = [1, 0] 1818->1822 1820 entropy = 0.0 samples = 14 value = [0, 14] 1819->1820 1821 entropy = 1.0 samples = 2 value = [1, 1] 1819->1821 1824 arrival_date_day_of_month <= 25.5 entropy = 0.727 samples = 464 value = [94, 370] 1823->1824 2001 adr <= 118.655 entropy = 0.909 samples = 1422 value = [461, 961] 1823->2001 1825 arrival_date_day_of_month <= 19.5 entropy = 0.667 samples = 379 value = [66, 313] 1824->1825 1954 arrival_date_year <= 2016.5 entropy = 0.914 samples = 85 value = [28, 57] 1824->1954 1826 adr <= 94.85 entropy = 0.719 samples = 297 value = [59, 238] 1825->1826 1937 arrival_date_week_number <= 52.5 entropy = 0.421 samples = 82 value = [7, 75] 1825->1937 1827 adr <= 79.9 entropy = 0.507 samples = 89 value = [10, 79] 1826->1827 1844 adr <= 97.49 entropy = 0.788 samples = 208 value = [49, 159] 1826->1844 1828 adr <= 73.45 entropy = 0.801 samples = 41 value = [10, 31] 1827->1828 1843 entropy = 0.0 samples = 48 value = [0, 48] 1827->1843 1829 adr <= 39.6 entropy = 0.402 samples = 25 value = [2, 23] 1828->1829 1836 arrival_date_day_of_month <= 8.5 entropy = 1.0 samples = 16 value = [8, 8] 1828->1836 1830 adr <= 31.72 entropy = 0.971 samples = 5 value = [2, 3] 1829->1830 1835 entropy = 0.0 samples = 20 value = [0, 20] 1829->1835 1831 entropy = 0.0 samples = 2 value = [0, 2] 1830->1831 1832 arrival_date_day_of_month <= 6.5 entropy = 0.918 samples = 3 value = [2, 1] 1830->1832 1833 entropy = 0.0 samples = 1 value = [0, 1] 1832->1833 1834 entropy = 0.0 samples = 2 value = [2, 0] 1832->1834 1837 entropy = 0.0 samples = 4 value = [4, 0] 1836->1837 1838 arrival_date_week_number <= 31.5 entropy = 0.918 samples = 12 value = [4, 8] 1836->1838 1839 country_DEU <= 0.5 entropy = 0.918 samples = 6 value = [4, 2] 1838->1839 1842 entropy = 0.0 samples = 6 value = [0, 6] 1838->1842 1840 entropy = 0.0 samples = 4 value = [4, 0] 1839->1840 1841 entropy = 0.0 samples = 2 value = [0, 2] 1839->1841 1845 arrival_date_day_of_month <= 16.5 entropy = 0.971 samples = 10 value = [6, 4] 1844->1845 1852 adr <= 100.15 entropy = 0.755 samples = 198 value = [43, 155] 1844->1852 1846 arrival_date_year <= 2016.5 entropy = 0.985 samples = 7 value = [3, 4] 1845->1846 1851 entropy = 0.0 samples = 3 value = [3, 0] 1845->1851 1847 total_of_special_requests <= 1.5 entropy = 0.811 samples = 4 value = [3, 1] 1846->1847 1850 entropy = 0.0 samples = 3 value = [0, 3] 1846->1850 1848 entropy = 0.0 samples = 3 value = [3, 0] 1847->1848 1849 entropy = 0.0 samples = 1 value = [0, 1] 1847->1849 1853 entropy = 0.0 samples = 11 value = [0, 11] 1852->1853 1854 meal_HB <= 0.5 entropy = 0.778 samples = 187 value = [43, 144] 1852->1854 1855 adr <= 235.0 entropy = 0.807 samples = 170 value = [42, 128] 1854->1855 1934 arrival_date_day_of_month <= 16.5 entropy = 0.323 samples = 17 value = [1, 16] 1854->1934 1856 arrival_date_day_of_month <= 4.5 entropy = 0.787 samples = 166 value = [39, 127] 1855->1856 1931 arrival_date_day_of_month <= 18.5 entropy = 0.811 samples = 4 value = [3, 1] 1855->1931 1857 assigned_room_type_D <= 0.5 entropy = 0.961 samples = 26 value = [10, 16] 1856->1857 1870 adr <= 111.05 entropy = 0.736 samples = 140 value = [29, 111] 1856->1870 1858 stays_in_week_nights <= 3.5 entropy = 0.831 samples = 19 value = [5, 14] 1857->1858 1865 arrival_date_week_number <= 22.5 entropy = 0.863 samples = 7 value = [5, 2] 1857->1865 1859 country_FRA <= 0.5 entropy = 0.672 samples = 17 value = [3, 14] 1858->1859 1864 entropy = 0.0 samples = 2 value = [2, 0] 1858->1864 1860 adr <= 113.4 entropy = 0.971 samples = 5 value = [3, 2] 1859->1860 1863 entropy = 0.0 samples = 12 value = [0, 12] 1859->1863 1861 entropy = 0.0 samples = 2 value = [0, 2] 1860->1861 1862 entropy = 0.0 samples = 3 value = [3, 0] 1860->1862 1866 stays_in_week_nights <= 1.5 entropy = 0.65 samples = 6 value = [5, 1] 1865->1866 1869 entropy = 0.0 samples = 1 value = [0, 1] 1865->1869 1867 entropy = 0.0 samples = 1 value = [0, 1] 1866->1867 1868 entropy = 0.0 samples = 5 value = [5, 0] 1866->1868 1871 adr <= 107.865 entropy = 0.98 samples = 24 value = [10, 14] 1870->1871 1890 adr <= 134.75 entropy = 0.643 samples = 116 value = [19, 97] 1870->1890 1872 arrival_date_day_of_month <= 11.5 entropy = 0.75 samples = 14 value = [3, 11] 1871->1872 1881 arrival_date_year <= 2016.5 entropy = 0.881 samples = 10 value = [7, 3] 1871->1881 1873 assigned_room_type_A <= 0.5 entropy = 0.954 samples = 8 value = [3, 5] 1872->1873 1880 entropy = 0.0 samples = 6 value = [0, 6] 1872->1880 1874 entropy = 0.0 samples = 3 value = [0, 3] 1873->1874 1875 stays_in_week_nights <= 3.5 entropy = 0.971 samples = 5 value = [3, 2] 1873->1875 1876 lead_time <= 92.5 entropy = 0.918 samples = 3 value = [1, 2] 1875->1876 1879 entropy = 0.0 samples = 2 value = [2, 0] 1875->1879 1877 entropy = 0.0 samples = 2 value = [0, 2] 1876->1877 1878 entropy = 0.0 samples = 1 value = [1, 0] 1876->1878 1882 country_FRA <= 0.5 entropy = 1.0 samples = 6 value = [3, 3] 1881->1882 1889 entropy = 0.0 samples = 4 value = [4, 0] 1881->1889 1883 entropy = 0.0 samples = 1 value = [0, 1] 1882->1883 1884 arrival_date_month_May <= 0.5 entropy = 0.971 samples = 5 value = [3, 2] 1882->1884 1885 stays_in_week_nights <= 2.0 entropy = 0.811 samples = 4 value = [3, 1] 1884->1885 1888 entropy = 0.0 samples = 1 value = [0, 1] 1884->1888 1886 entropy = 1.0 samples = 2 value = [1, 1] 1885->1886 1887 entropy = 0.0 samples = 2 value = [2, 0] 1885->1887 1891 lead_time <= 172.5 entropy = 0.281 samples = 41 value = [2, 39] 1890->1891 1898 arrival_date_day_of_month <= 7.5 entropy = 0.772 samples = 75 value = [17, 58] 1890->1898 1892 entropy = 0.0 samples = 32 value = [0, 32] 1891->1892 1893 arrival_date_week_number <= 21.5 entropy = 0.764 samples = 9 value = [2, 7] 1891->1893 1894 reserved_room_type_A <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 1893->1894 1897 entropy = 0.0 samples = 6 value = [0, 6] 1893->1897 1895 entropy = 0.0 samples = 1 value = [0, 1] 1894->1895 1896 entropy = 0.0 samples = 2 value = [2, 0] 1894->1896 1899 entropy = 0.0 samples = 9 value = [0, 9] 1898->1899 1900 adr <= 135.645 entropy = 0.823 samples = 66 value = [17, 49] 1898->1900 1901 stays_in_weekend_nights <= 0.5 entropy = 0.991 samples = 9 value = [5, 4] 1900->1901 1906 lead_time <= 44.5 entropy = 0.742 samples = 57 value = [12, 45] 1900->1906 1902 entropy = 0.0 samples = 3 value = [0, 3] 1901->1902 1903 arrival_date_month_May <= 0.5 entropy = 0.65 samples = 6 value = [5, 1] 1901->1903 1904 entropy = 0.0 samples = 5 value = [5, 0] 1903->1904 1905 entropy = 0.0 samples = 1 value = [0, 1] 1903->1905 1907 adr <= 148.0 entropy = 0.932 samples = 23 value = [8, 15] 1906->1907 1918 arrival_date_month_December <= 0.5 entropy = 0.523 samples = 34 value = [4, 30] 1906->1918 1908 country_DEU <= 0.5 entropy = 0.439 samples = 11 value = [1, 10] 1907->1908 1911 adults <= 2.5 entropy = 0.98 samples = 12 value = [7, 5] 1907->1911 1909 entropy = 0.0 samples = 9 value = [0, 9] 1908->1909 1910 entropy = 1.0 samples = 2 value = [1, 1] 1908->1910 1912 lead_time <= 28.5 entropy = 0.881 samples = 10 value = [7, 3] 1911->1912 1917 entropy = 0.0 samples = 2 value = [0, 2] 1911->1917 1913 arrival_date_week_number <= 28.5 entropy = 1.0 samples = 6 value = [3, 3] 1912->1913 1916 entropy = 0.0 samples = 4 value = [4, 0] 1912->1916 1914 entropy = 0.0 samples = 3 value = [3, 0] 1913->1914 1915 entropy = 0.0 samples = 3 value = [0, 3] 1913->1915 1919 country_FRA <= 0.5 entropy = 0.439 samples = 33 value = [3, 30] 1918->1919 1930 entropy = 0.0 samples = 1 value = [1, 0] 1918->1930 1920 entropy = 0.0 samples = 13 value = [0, 13] 1919->1920 1921 adr <= 205.0 entropy = 0.61 samples = 20 value = [3, 17] 1919->1921 1922 adr <= 140.7 entropy = 0.485 samples = 19 value = [2, 17] 1921->1922 1929 entropy = 0.0 samples = 1 value = [1, 0] 1921->1929 1923 arrival_date_month_September <= 0.5 entropy = 0.811 samples = 8 value = [2, 6] 1922->1923 1928 entropy = 0.0 samples = 11 value = [0, 11] 1922->1928 1924 booking_changes <= 0.5 entropy = 0.592 samples = 7 value = [1, 6] 1923->1924 1927 entropy = 0.0 samples = 1 value = [1, 0] 1923->1927 1925 entropy = 0.0 samples = 6 value = [0, 6] 1924->1925 1926 entropy = 0.0 samples = 1 value = [1, 0] 1924->1926 1932 entropy = 0.0 samples = 3 value = [3, 0] 1931->1932 1933 entropy = 0.0 samples = 1 value = [0, 1] 1931->1933 1935 entropy = 0.0 samples = 16 value = [0, 16] 1934->1935 1936 entropy = 0.0 samples = 1 value = [1, 0] 1934->1936 1938 arrival_date_year <= 2016.5 entropy = 0.34 samples = 79 value = [5, 74] 1937->1938 1951 adr <= 93.205 entropy = 0.918 samples = 3 value = [2, 1] 1937->1951 1939 entropy = 0.0 samples = 41 value = [0, 41] 1938->1939 1940 adr <= 125.55 entropy = 0.562 samples = 38 value = [5, 33] 1938->1940 1941 assigned_room_type_A <= 0.5 entropy = 0.961 samples = 13 value = [5, 8] 1940->1941 1950 entropy = 0.0 samples = 25 value = [0, 25] 1940->1950 1942 entropy = 0.0 samples = 2 value = [2, 0] 1941->1942 1943 arrival_date_day_of_month <= 20.5 entropy = 0.845 samples = 11 value = [3, 8] 1941->1943 1944 entropy = 0.0 samples = 5 value = [0, 5] 1943->1944 1945 meal_SC <= 0.5 entropy = 1.0 samples = 6 value = [3, 3] 1943->1945 1946 stays_in_week_nights <= 1.5 entropy = 0.811 samples = 4 value = [1, 3] 1945->1946 1949 entropy = 0.0 samples = 2 value = [2, 0] 1945->1949 1947 entropy = 0.0 samples = 1 value = [1, 0] 1946->1947 1948 entropy = 0.0 samples = 3 value = [0, 3] 1946->1948 1952 entropy = 0.0 samples = 1 value = [0, 1] 1951->1952 1953 entropy = 0.0 samples = 2 value = [2, 0] 1951->1953 1955 arrival_date_week_number <= 46.5 entropy = 0.752 samples = 51 value = [11, 40] 1954->1955 1978 assigned_room_type_D <= 0.5 entropy = 1.0 samples = 34 value = [17, 17] 1954->1978 1956 adr <= 109.71 entropy = 0.601 samples = 41 value = [6, 35] 1955->1956 1969 adr <= 128.55 entropy = 1.0 samples = 10 value = [5, 5] 1955->1969 1957 entropy = 0.0 samples = 14 value = [0, 14] 1956->1957 1958 adr <= 115.075 entropy = 0.764 samples = 27 value = [6, 21] 1956->1958 1959 entropy = 0.0 samples = 2 value = [2, 0] 1958->1959 1960 stays_in_week_nights <= 1.5 entropy = 0.634 samples = 25 value = [4, 21] 1958->1960 1961 adr <= 140.0 entropy = 0.985 samples = 7 value = [3, 4] 1960->1961 1966 lead_time <= 13.0 entropy = 0.31 samples = 18 value = [1, 17] 1960->1966 1962 meal_SC <= 0.5 entropy = 0.811 samples = 4 value = [3, 1] 1961->1962 1965 entropy = 0.0 samples = 3 value = [0, 3] 1961->1965 1963 entropy = 0.0 samples = 3 value = [3, 0] 1962->1963 1964 entropy = 0.0 samples = 1 value = [0, 1] 1962->1964 1967 entropy = 0.0 samples = 1 value = [1, 0] 1966->1967 1968 entropy = 0.0 samples = 17 value = [0, 17] 1966->1968 1970 arrival_date_day_of_month <= 27.5 entropy = 0.954 samples = 8 value = [5, 3] 1969->1970 1977 entropy = 0.0 samples = 2 value = [0, 2] 1969->1977 1971 lead_time <= 80.5 entropy = 0.971 samples = 5 value = [2, 3] 1970->1971 1976 entropy = 0.0 samples = 3 value = [3, 0] 1970->1976 1972 entropy = 0.0 samples = 2 value = [0, 2] 1971->1972 1973 country_FRA <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 1971->1973 1974 entropy = 0.0 samples = 1 value = [0, 1] 1973->1974 1975 entropy = 0.0 samples = 2 value = [2, 0] 1973->1975 1979 adr <= 83.6 entropy = 0.946 samples = 22 value = [8, 14] 1978->1979 1994 stays_in_weekend_nights <= 1.5 entropy = 0.811 samples = 12 value = [9, 3] 1978->1994 1980 entropy = 0.0 samples = 3 value = [3, 0] 1979->1980 1981 total_of_special_requests <= 1.5 entropy = 0.831 samples = 19 value = [5, 14] 1979->1981 1982 country_FRA <= 0.5 entropy = 0.94 samples = 14 value = [5, 9] 1981->1982 1993 entropy = 0.0 samples = 5 value = [0, 5] 1981->1993 1983 stays_in_week_nights <= 1.5 entropy = 0.985 samples = 7 value = [4, 3] 1982->1983 1990 arrival_date_month_May <= 0.5 entropy = 0.592 samples = 7 value = [1, 6] 1982->1990 1984 entropy = 0.0 samples = 2 value = [2, 0] 1983->1984 1985 lead_time <= 117.5 entropy = 0.971 samples = 5 value = [2, 3] 1983->1985 1986 entropy = 0.0 samples = 2 value = [0, 2] 1985->1986 1987 lead_time <= 187.5 entropy = 0.918 samples = 3 value = [2, 1] 1985->1987 1988 entropy = 0.0 samples = 2 value = [2, 0] 1987->1988 1989 entropy = 0.0 samples = 1 value = [0, 1] 1987->1989 1991 entropy = 0.0 samples = 5 value = [0, 5] 1990->1991 1992 entropy = 1.0 samples = 2 value = [1, 1] 1990->1992 1995 entropy = 0.0 samples = 6 value = [6, 0] 1994->1995 1996 stays_in_week_nights <= 4.0 entropy = 1.0 samples = 6 value = [3, 3] 1994->1996 1997 adr <= 104.095 entropy = 0.811 samples = 4 value = [3, 1] 1996->1997 2000 entropy = 0.0 samples = 2 value = [0, 2] 1996->2000 1998 entropy = 0.0 samples = 1 value = [0, 1] 1997->1998 1999 entropy = 0.0 samples = 3 value = [3, 0] 1997->1999 2002 arrival_date_week_number <= 42.5 entropy = 0.853 samples = 751 value = [209, 542] 2001->2002 2337 arrival_date_year <= 2016.5 entropy = 0.955 samples = 671 value = [252, 419] 2001->2337 2003 arrival_date_year <= 2016.5 entropy = 0.805 samples = 581 value = [143, 438] 2002->2003 2236 stays_in_week_nights <= 3.5 entropy = 0.964 samples = 170 value = [66, 104] 2002->2236 2004 adr <= 90.95 entropy = 0.635 samples = 243 value = [39, 204] 2003->2004 2071 booking_changes <= 0.5 entropy = 0.89 samples = 338 value = [104, 234] 2003->2071 2005 arrival_date_week_number <= 9.5 entropy = 0.389 samples = 131 value = [10, 121] 2004->2005 2028 arrival_date_week_number <= 18.5 entropy = 0.825 samples = 112 value = [29, 83] 2004->2028 2006 hotel_City Hotel <= 0.5 entropy = 0.811 samples = 28 value = [7, 21] 2005->2006 2015 arrival_date_week_number <= 19.5 entropy = 0.19 samples = 103 value = [3, 100] 2005->2015 2007 entropy = 0.0 samples = 11 value = [0, 11] 2006->2007 2008 adr <= 68.5 entropy = 0.977 samples = 17 value = [7, 10] 2006->2008 2009 entropy = 0.0 samples = 4 value = [4, 0] 2008->2009 2010 arrival_date_week_number <= 8.5 entropy = 0.779 samples = 13 value = [3, 10] 2008->2010 2011 entropy = 0.0 samples = 9 value = [0, 9] 2010->2011 2012 booking_changes <= 0.5 entropy = 0.811 samples = 4 value = [3, 1] 2010->2012 2013 entropy = 0.0 samples = 3 value = [3, 0] 2012->2013 2014 entropy = 0.0 samples = 1 value = [0, 1] 2012->2014 2016 adr <= 72.125 entropy = 0.323 samples = 51 value = [3, 48] 2015->2016 2027 entropy = 0.0 samples = 52 value = [0, 52] 2015->2027 2017 entropy = 0.0 samples = 24 value = [0, 24] 2016->2017 2018 adr <= 72.665 entropy = 0.503 samples = 27 value = [3, 24] 2016->2018 2019 entropy = 0.0 samples = 1 value = [1, 0] 2018->2019 2020 lead_time <= 55.0 entropy = 0.391 samples = 26 value = [2, 24] 2018->2020 2021 assigned_room_type_D <= 0.5 entropy = 0.684 samples = 11 value = [2, 9] 2020->2021 2026 entropy = 0.0 samples = 15 value = [0, 15] 2020->2026 2022 lead_time <= 48.5 entropy = 0.469 samples = 10 value = [1, 9] 2021->2022 2025 entropy = 0.0 samples = 1 value = [1, 0] 2021->2025 2023 entropy = 0.0 samples = 9 value = [0, 9] 2022->2023 2024 entropy = 0.0 samples = 1 value = [1, 0] 2022->2024 2029 lead_time <= 102.5 entropy = 0.999 samples = 33 value = [17, 16] 2028->2029 2048 stays_in_weekend_nights <= 1.5 entropy = 0.615 samples = 79 value = [12, 67] 2028->2048 2030 lead_time <= 83.0 entropy = 0.992 samples = 29 value = [13, 16] 2029->2030 2047 entropy = 0.0 samples = 4 value = [4, 0] 2029->2047 2031 adr <= 112.04 entropy = 0.999 samples = 25 value = [13, 12] 2030->2031 2046 entropy = 0.0 samples = 4 value = [0, 4] 2030->2046 2032 adr <= 96.85 entropy = 0.982 samples = 19 value = [8, 11] 2031->2032 2043 adr <= 115.7 entropy = 0.65 samples = 6 value = [5, 1] 2031->2043 2033 adr <= 93.915 entropy = 0.811 samples = 8 value = [6, 2] 2032->2033 2038 lead_time <= 69.5 entropy = 0.684 samples = 11 value = [2, 9] 2032->2038 2034 arrival_date_day_of_month <= 13.0 entropy = 1.0 samples = 4 value = [2, 2] 2033->2034 2037 entropy = 0.0 samples = 4 value = [4, 0] 2033->2037 2035 entropy = 0.0 samples = 2 value = [2, 0] 2034->2035 2036 entropy = 0.0 samples = 2 value = [0, 2] 2034->2036 2039 entropy = 0.0 samples = 7 value = [0, 7] 2038->2039 2040 arrival_date_week_number <= 15.5 entropy = 1.0 samples = 4 value = [2, 2] 2038->2040 2041 entropy = 0.0 samples = 2 value = [0, 2] 2040->2041 2042 entropy = 0.0 samples = 2 value = [2, 0] 2040->2042 2044 entropy = 0.0 samples = 5 value = [5, 0] 2043->2044 2045 entropy = 0.0 samples = 1 value = [0, 1] 2043->2045 2049 lead_time <= 158.0 entropy = 0.757 samples = 55 value = [12, 43] 2048->2049 2070 entropy = 0.0 samples = 24 value = [0, 24] 2048->2070 2050 arrival_date_day_of_month <= 20.0 entropy = 0.592 samples = 42 value = [6, 36] 2049->2050 2063 arrival_date_day_of_month <= 10.5 entropy = 0.996 samples = 13 value = [6, 7] 2049->2063 2051 reserved_room_type_D <= 0.5 entropy = 0.222 samples = 28 value = [1, 27] 2050->2051 2054 hotel_City Hotel <= 0.5 entropy = 0.94 samples = 14 value = [5, 9] 2050->2054 2052 entropy = 0.0 samples = 26 value = [0, 26] 2051->2052 2053 entropy = 1.0 samples = 2 value = [1, 1] 2051->2053 2055 entropy = 0.0 samples = 2 value = [2, 0] 2054->2055 2056 arrival_date_day_of_month <= 22.0 entropy = 0.811 samples = 12 value = [3, 9] 2054->2056 2057 entropy = 0.0 samples = 1 value = [1, 0] 2056->2057 2058 arrival_date_week_number <= 35.5 entropy = 0.684 samples = 11 value = [2, 9] 2056->2058 2059 entropy = 0.0 samples = 7 value = [0, 7] 2058->2059 2060 adults <= 1.5 entropy = 1.0 samples = 4 value = [2, 2] 2058->2060 2061 entropy = 0.0 samples = 2 value = [0, 2] 2060->2061 2062 entropy = 0.0 samples = 2 value = [2, 0] 2060->2062 2064 stays_in_week_nights <= 1.5 entropy = 0.65 samples = 6 value = [5, 1] 2063->2064 2067 arrival_date_week_number <= 39.0 entropy = 0.592 samples = 7 value = [1, 6] 2063->2067 2065 entropy = 0.0 samples = 1 value = [0, 1] 2064->2065 2066 entropy = 0.0 samples = 5 value = [5, 0] 2064->2066 2068 entropy = 0.0 samples = 5 value = [0, 5] 2067->2068 2069 entropy = 1.0 samples = 2 value = [1, 1] 2067->2069 2072 assigned_room_type_A <= 0.5 entropy = 0.929 samples = 276 value = [95, 181] 2071->2072 2215 assigned_room_type_B <= 0.5 entropy = 0.598 samples = 62 value = [9, 53] 2071->2215 2073 adr <= 63.07 entropy = 0.684 samples = 77 value = [14, 63] 2072->2073 2100 arrival_date_week_number <= 22.5 entropy = 0.975 samples = 199 value = [81, 118] 2072->2100 2074 adr <= 46.6 entropy = 0.971 samples = 20 value = [8, 12] 2073->2074 2081 lead_time <= 57.0 entropy = 0.485 samples = 57 value = [6, 51] 2073->2081 2075 entropy = 0.0 samples = 7 value = [0, 7] 2074->2075 2076 arrival_date_day_of_month <= 8.0 entropy = 0.961 samples = 13 value = [8, 5] 2074->2076 2077 entropy = 0.0 samples = 3 value = [0, 3] 2076->2077 2078 arrival_date_week_number <= 12.0 entropy = 0.722 samples = 10 value = [8, 2] 2076->2078 2079 entropy = 0.0 samples = 8 value = [8, 0] 2078->2079 2080 entropy = 0.0 samples = 2 value = [0, 2] 2078->2080 2082 entropy = 0.0 samples = 18 value = [0, 18] 2081->2082 2083 lead_time <= 68.5 entropy = 0.619 samples = 39 value = [6, 33] 2081->2083 2084 hotel_Resort Hotel <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 2083->2084 2087 children <= 1.5 entropy = 0.503 samples = 36 value = [4, 32] 2083->2087 2085 entropy = 0.0 samples = 1 value = [0, 1] 2084->2085 2086 entropy = 0.0 samples = 2 value = [2, 0] 2084->2086 2088 arrival_date_day_of_month <= 1.5 entropy = 0.33 samples = 33 value = [2, 31] 2087->2088 2097 stays_in_weekend_nights <= 1.5 entropy = 0.918 samples = 3 value = [2, 1] 2087->2097 2089 entropy = 1.0 samples = 2 value = [1, 1] 2088->2089 2090 arrival_date_month_March <= 0.5 entropy = 0.206 samples = 31 value = [1, 30] 2088->2090 2091 entropy = 0.0 samples = 24 value = [0, 24] 2090->2091 2092 lead_time <= 96.0 entropy = 0.592 samples = 7 value = [1, 6] 2090->2092 2093 entropy = 0.0 samples = 3 value = [0, 3] 2092->2093 2094 lead_time <= 116.5 entropy = 0.811 samples = 4 value = [1, 3] 2092->2094 2095 entropy = 0.0 samples = 1 value = [1, 0] 2094->2095 2096 entropy = 0.0 samples = 3 value = [0, 3] 2094->2096 2098 entropy = 0.0 samples = 1 value = [0, 1] 2097->2098 2099 entropy = 0.0 samples = 2 value = [2, 0] 2097->2099 2101 lead_time <= 35.5 entropy = 0.991 samples = 151 value = [67, 84] 2100->2101 2184 lead_time <= 13.0 entropy = 0.871 samples = 48 value = [14, 34] 2100->2184 2102 lead_time <= 33.5 entropy = 0.863 samples = 35 value = [10, 25] 2101->2102 2119 stays_in_week_nights <= 2.5 entropy = 1.0 samples = 116 value = [57, 59] 2101->2119 2103 arrival_date_day_of_month <= 18.5 entropy = 0.929 samples = 29 value = [10, 19] 2102->2103 2118 entropy = 0.0 samples = 6 value = [0, 6] 2102->2118 2104 arrival_date_day_of_month <= 3.0 entropy = 0.998 samples = 19 value = [9, 10] 2103->2104 2113 lead_time <= 25.5 entropy = 0.469 samples = 10 value = [1, 9] 2103->2113 2105 entropy = 0.0 samples = 4 value = [0, 4] 2104->2105 2106 arrival_date_month_February <= 0.5 entropy = 0.971 samples = 15 value = [9, 6] 2104->2106 2107 adr <= 101.405 entropy = 0.722 samples = 10 value = [8, 2] 2106->2107 2110 stays_in_weekend_nights <= 1.5 entropy = 0.722 samples = 5 value = [1, 4] 2106->2110 2108 entropy = 0.0 samples = 8 value = [8, 0] 2107->2108 2109 entropy = 0.0 samples = 2 value = [0, 2] 2107->2109 2111 entropy = 0.0 samples = 4 value = [0, 4] 2110->2111 2112 entropy = 0.0 samples = 1 value = [1, 0] 2110->2112 2114 entropy = 0.0 samples = 7 value = [0, 7] 2113->2114 2115 stays_in_weekend_nights <= 0.5 entropy = 0.918 samples = 3 value = [1, 2] 2113->2115 2116 entropy = 0.0 samples = 2 value = [0, 2] 2115->2116 2117 entropy = 0.0 samples = 1 value = [1, 0] 2115->2117 2120 adr <= 68.275 entropy = 0.958 samples = 50 value = [31, 19] 2119->2120 2145 stays_in_week_nights <= 6.0 entropy = 0.967 samples = 66 value = [26, 40] 2119->2145 2121 hotel_City Hotel <= 0.5 entropy = 0.65 samples = 6 value = [1, 5] 2120->2121 2124 arrival_date_week_number <= 7.5 entropy = 0.902 samples = 44 value = [30, 14] 2120->2124 2122 entropy = 0.0 samples = 1 value = [1, 0] 2121->2122 2123 entropy = 0.0 samples = 5 value = [0, 5] 2121->2123 2125 entropy = 0.0 samples = 8 value = [8, 0] 2124->2125 2126 arrival_date_week_number <= 9.5 entropy = 0.964 samples = 36 value = [22, 14] 2124->2126 2127 arrival_date_day_of_month <= 24.0 entropy = 0.811 samples = 4 value = [1, 3] 2126->2127 2130 adr <= 107.55 entropy = 0.928 samples = 32 value = [21, 11] 2126->2130 2128 entropy = 0.0 samples = 3 value = [0, 3] 2127->2128 2129 entropy = 0.0 samples = 1 value = [1, 0] 2127->2129 2131 adr <= 86.95 entropy = 0.742 samples = 19 value = [15, 4] 2130->2131 2138 stays_in_weekend_nights <= 1.5 entropy = 0.996 samples = 13 value = [6, 7] 2130->2138 2132 stays_in_weekend_nights <= 0.5 entropy = 0.991 samples = 9 value = [5, 4] 2131->2132 2137 entropy = 0.0 samples = 10 value = [10, 0] 2131->2137 2133 entropy = 0.0 samples = 3 value = [3, 0] 2132->2133 2134 arrival_date_day_of_month <= 12.5 entropy = 0.918 samples = 6 value = [2, 4] 2132->2134 2135 entropy = 0.0 samples = 2 value = [2, 0] 2134->2135 2136 entropy = 0.0 samples = 4 value = [0, 4] 2134->2136 2139 arrival_date_day_of_month <= 2.5 entropy = 0.592 samples = 7 value = [1, 6] 2138->2139 2142 arrival_date_month_May <= 0.5 entropy = 0.65 samples = 6 value = [5, 1] 2138->2142 2140 entropy = 0.0 samples = 1 value = [1, 0] 2139->2140 2141 entropy = 0.0 samples = 6 value = [0, 6] 2139->2141 2143 entropy = 0.0 samples = 5 value = [5, 0] 2142->2143 2144 entropy = 0.0 samples = 1 value = [0, 1] 2142->2144 2146 meal_HB <= 0.5 entropy = 0.954 samples = 64 value = [24, 40] 2145->2146 2183 entropy = 0.0 samples = 2 value = [2, 0] 2145->2183 2147 adr <= 39.9 entropy = 0.938 samples = 62 value = [22, 40] 2146->2147 2182 entropy = 0.0 samples = 2 value = [2, 0] 2146->2182 2148 entropy = 0.0 samples = 2 value = [2, 0] 2147->2148 2149 children <= 0.5 entropy = 0.918 samples = 60 value = [20, 40] 2147->2149 2150 stays_in_weekend_nights <= 0.5 entropy = 0.894 samples = 58 value = [18, 40] 2149->2150 2181 entropy = 0.0 samples = 2 value = [2, 0] 2149->2181 2151 lead_time <= 251.0 entropy = 0.99 samples = 25 value = [11, 14] 2150->2151 2166 arrival_date_day_of_month <= 7.0 entropy = 0.746 samples = 33 value = [7, 26] 2150->2166 2152 lead_time <= 56.5 entropy = 0.946 samples = 22 value = [8, 14] 2151->2152 2165 entropy = 0.0 samples = 3 value = [3, 0] 2151->2165 2153 entropy = 0.0 samples = 2 value = [2, 0] 2152->2153 2154 lead_time <= 88.0 entropy = 0.881 samples = 20 value = [6, 14] 2152->2154 2155 entropy = 0.0 samples = 6 value = [0, 6] 2154->2155 2156 lead_time <= 130.5 entropy = 0.985 samples = 14 value = [6, 8] 2154->2156 2157 adr <= 79.75 entropy = 0.722 samples = 5 value = [4, 1] 2156->2157 2160 arrival_date_day_of_month <= 12.0 entropy = 0.764 samples = 9 value = [2, 7] 2156->2160 2158 entropy = 0.0 samples = 1 value = [0, 1] 2157->2158 2159 entropy = 0.0 samples = 4 value = [4, 0] 2157->2159 2161 entropy = 0.0 samples = 5 value = [0, 5] 2160->2161 2162 meal_BB <= 0.5 entropy = 1.0 samples = 4 value = [2, 2] 2160->2162 2163 entropy = 0.0 samples = 2 value = [2, 0] 2162->2163 2164 entropy = 0.0 samples = 2 value = [0, 2] 2162->2164 2167 entropy = 0.0 samples = 8 value = [0, 8] 2166->2167 2168 arrival_date_day_of_month <= 8.5 entropy = 0.855 samples = 25 value = [7, 18] 2166->2168 2169 entropy = 0.0 samples = 2 value = [2, 0] 2168->2169 2170 adr <= 79.425 entropy = 0.755 samples = 23 value = [5, 18] 2168->2170 2171 adr <= 53.4 entropy = 0.961 samples = 13 value = [5, 8] 2170->2171 2180 entropy = 0.0 samples = 10 value = [0, 10] 2170->2180 2172 entropy = 0.0 samples = 4 value = [0, 4] 2171->2172 2173 adr <= 71.91 entropy = 0.991 samples = 9 value = [5, 4] 2171->2173 2174 entropy = 0.0 samples = 3 value = [3, 0] 2173->2174 2175 stays_in_weekend_nights <= 1.5 entropy = 0.918 samples = 6 value = [2, 4] 2173->2175 2176 arrival_date_week_number <= 6.0 entropy = 0.918 samples = 3 value = [2, 1] 2175->2176 2179 entropy = 0.0 samples = 3 value = [0, 3] 2175->2179 2177 entropy = 0.0 samples = 1 value = [0, 1] 2176->2177 2178 entropy = 0.0 samples = 2 value = [2, 0] 2176->2178 2185 entropy = 0.0 samples = 2 value = [2, 0] 2184->2185 2186 lead_time <= 64.5 entropy = 0.828 samples = 46 value = [12, 34] 2184->2186 2187 entropy = 0.0 samples = 8 value = [0, 8] 2186->2187 2188 lead_time <= 77.5 entropy = 0.9 samples = 38 value = [12, 26] 2186->2188 2189 entropy = 0.0 samples = 2 value = [2, 0] 2188->2189 2190 lead_time <= 173.5 entropy = 0.852 samples = 36 value = [10, 26] 2188->2190 2191 entropy = 0.0 samples = 9 value = [0, 9] 2190->2191 2192 adr <= 84.6 entropy = 0.951 samples = 27 value = [10, 17] 2190->2192 2193 entropy = 0.0 samples = 4 value = [0, 4] 2192->2193 2194 adr <= 114.75 entropy = 0.988 samples = 23 value = [10, 13] 2192->2194 2195 adults <= 1.5 entropy = 0.998 samples = 21 value = [10, 11] 2194->2195 2214 entropy = 0.0 samples = 2 value = [0, 2] 2194->2214 2196 entropy = 0.0 samples = 1 value = [1, 0] 2195->2196 2197 lead_time <= 293.0 entropy = 0.993 samples = 20 value = [9, 11] 2195->2197 2198 stays_in_week_nights <= 2.5 entropy = 1.0 samples = 18 value = [9, 9] 2197->2198 2213 entropy = 0.0 samples = 2 value = [0, 2] 2197->2213 2199 arrival_date_week_number <= 28.5 entropy = 0.811 samples = 8 value = [2, 6] 2198->2199 2206 stays_in_week_nights <= 5.5 entropy = 0.881 samples = 10 value = [7, 3] 2198->2206 2200 stays_in_week_nights <= 0.5 entropy = 1.0 samples = 4 value = [2, 2] 2199->2200 2205 entropy = 0.0 samples = 4 value = [0, 4] 2199->2205 2201 entropy = 0.0 samples = 1 value = [1, 0] 2200->2201 2202 total_of_special_requests <= 1.5 entropy = 0.918 samples = 3 value = [1, 2] 2200->2202 2203 entropy = 0.0 samples = 2 value = [0, 2] 2202->2203 2204 entropy = 0.0 samples = 1 value = [1, 0] 2202->2204 2207 adr <= 103.68 entropy = 0.764 samples = 9 value = [7, 2] 2206->2207 2212 entropy = 0.0 samples = 1 value = [0, 1] 2206->2212 2208 total_of_special_requests <= 1.5 entropy = 0.971 samples = 5 value = [3, 2] 2207->2208 2211 entropy = 0.0 samples = 4 value = [4, 0] 2207->2211 2209 entropy = 0.0 samples = 2 value = [0, 2] 2208->2209 2210 entropy = 0.0 samples = 3 value = [3, 0] 2208->2210 2216 adr <= 86.5 entropy = 0.52 samples = 60 value = [7, 53] 2215->2216 2235 entropy = 0.0 samples = 2 value = [2, 0] 2215->2235 2217 entropy = 0.0 samples = 19 value = [0, 19] 2216->2217 2218 adr <= 89.05 entropy = 0.659 samples = 41 value = [7, 34] 2216->2218 2219 entropy = 0.0 samples = 2 value = [2, 0] 2218->2219 2220 total_of_special_requests <= 1.5 entropy = 0.552 samples = 39 value = [5, 34] 2218->2220 2221 arrival_date_week_number <= 10.5 entropy = 0.722 samples = 25 value = [5, 20] 2220->2221 2234 entropy = 0.0 samples = 14 value = [0, 14] 2220->2234 2222 stays_in_week_nights <= 4.5 entropy = 0.918 samples = 3 value = [2, 1] 2221->2222 2225 meal_BB <= 0.5 entropy = 0.575 samples = 22 value = [3, 19] 2221->2225 2223 entropy = 0.0 samples = 2 value = [2, 0] 2222->2223 2224 entropy = 0.0 samples = 1 value = [0, 1] 2222->2224 2226 lead_time <= 124.5 entropy = 0.918 samples = 9 value = [3, 6] 2225->2226 2233 entropy = 0.0 samples = 13 value = [0, 13] 2225->2233 2227 entropy = 0.0 samples = 3 value = [0, 3] 2226->2227 2228 stays_in_weekend_nights <= 1.5 entropy = 1.0 samples = 6 value = [3, 3] 2226->2228 2229 arrival_date_month_August <= 0.5 entropy = 0.811 samples = 4 value = [3, 1] 2228->2229 2232 entropy = 0.0 samples = 2 value = [0, 2] 2228->2232 2230 entropy = 0.0 samples = 3 value = [3, 0] 2229->2230 2231 entropy = 0.0 samples = 1 value = [0, 1] 2229->2231 2237 hotel_Resort Hotel <= 0.5 entropy = 0.907 samples = 124 value = [40, 84] 2236->2237 2308 adr <= 74.375 entropy = 0.988 samples = 46 value = [26, 20] 2236->2308 2238 lead_time <= 181.5 entropy = 0.807 samples = 93 value = [23, 70] 2237->2238 2291 lead_time <= 88.5 entropy = 0.993 samples = 31 value = [17, 14] 2237->2291 2239 adr <= 79.1 entropy = 0.722 samples = 80 value = [16, 64] 2238->2239 2284 reserved_room_type_A <= 0.5 entropy = 0.996 samples = 13 value = [7, 6] 2238->2284 2240 entropy = 0.0 samples = 12 value = [0, 12] 2239->2240 2241 arrival_date_day_of_month <= 23.5 entropy = 0.787 samples = 68 value = [16, 52] 2239->2241 2242 adr <= 107.82 entropy = 0.867 samples = 52 value = [15, 37] 2241->2242 2279 stays_in_weekend_nights <= 0.5 entropy = 0.337 samples = 16 value = [1, 15] 2241->2279 2243 adr <= 105.1 entropy = 0.903 samples = 47 value = [15, 32] 2242->2243 2278 entropy = 0.0 samples = 5 value = [0, 5] 2242->2278 2244 arrival_date_day_of_month <= 7.5 entropy = 0.867 samples = 45 value = [13, 32] 2243->2244 2277 entropy = 0.0 samples = 2 value = [2, 0] 2243->2277 2245 adr <= 88.345 entropy = 0.523 samples = 17 value = [2, 15] 2244->2245 2256 lead_time <= 48.0 entropy = 0.967 samples = 28 value = [11, 17] 2244->2256 2246 arrival_date_day_of_month <= 2.5 entropy = 0.811 samples = 8 value = [2, 6] 2245->2246 2255 entropy = 0.0 samples = 9 value = [0, 9] 2245->2255 2247 entropy = 0.0 samples = 2 value = [0, 2] 2246->2247 2248 stays_in_week_nights <= 1.5 entropy = 0.918 samples = 6 value = [2, 4] 2246->2248 2249 entropy = 0.0 samples = 2 value = [0, 2] 2248->2249 2250 lead_time <= 45.5 entropy = 1.0 samples = 4 value = [2, 2] 2248->2250 2251 entropy = 0.0 samples = 1 value = [1, 0] 2250->2251 2252 adr <= 83.6 entropy = 0.918 samples = 3 value = [1, 2] 2250->2252 2253 entropy = 0.0 samples = 1 value = [1, 0] 2252->2253 2254 entropy = 0.0 samples = 2 value = [0, 2] 2252->2254 2257 adr <= 88.2 entropy = 0.918 samples = 9 value = [6, 3] 2256->2257 2264 arrival_date_day_of_month <= 11.5 entropy = 0.831 samples = 19 value = [5, 14] 2256->2264 2258 entropy = 0.0 samples = 4 value = [4, 0] 2257->2258 2259 adr <= 94.65 entropy = 0.971 samples = 5 value = [2, 3] 2257->2259 2260 entropy = 0.0 samples = 2 value = [0, 2] 2259->2260 2261 total_of_special_requests <= 1.5 entropy = 0.918 samples = 3 value = [2, 1] 2259->2261 2262 entropy = 0.0 samples = 1 value = [0, 1] 2261->2262 2263 entropy = 0.0 samples = 2 value = [2, 0] 2261->2263 2265 entropy = 0.0 samples = 6 value = [0, 6] 2264->2265 2266 stays_in_week_nights <= 0.5 entropy = 0.961 samples = 13 value = [5, 8] 2264->2266 2267 entropy = 0.0 samples = 2 value = [0, 2] 2266->2267 2268 lead_time <= 55.5 entropy = 0.994 samples = 11 value = [5, 6] 2266->2268 2269 entropy = 0.0 samples = 2 value = [0, 2] 2268->2269 2270 lead_time <= 69.0 entropy = 0.991 samples = 9 value = [5, 4] 2268->2270 2271 entropy = 0.0 samples = 2 value = [2, 0] 2270->2271 2272 arrival_date_day_of_month <= 16.0 entropy = 0.985 samples = 7 value = [3, 4] 2270->2272 2273 booking_changes <= 1.0 entropy = 0.811 samples = 4 value = [3, 1] 2272->2273 2276 entropy = 0.0 samples = 3 value = [0, 3] 2272->2276 2274 entropy = 0.0 samples = 2 value = [2, 0] 2273->2274 2275 entropy = 1.0 samples = 2 value = [1, 1] 2273->2275 2280 meal_BB <= 0.5 entropy = 0.722 samples = 5 value = [1, 4] 2279->2280 2283 entropy = 0.0 samples = 11 value = [0, 11] 2279->2283 2281 entropy = 1.0 samples = 2 value = [1, 1] 2280->2281 2282 entropy = 0.0 samples = 3 value = [0, 3] 2280->2282 2285 entropy = 0.0 samples = 3 value = [0, 3] 2284->2285 2286 adr <= 75.825 entropy = 0.881 samples = 10 value = [7, 3] 2284->2286 2287 stays_in_weekend_nights <= 1.5 entropy = 0.811 samples = 4 value = [1, 3] 2286->2287 2290 entropy = 0.0 samples = 6 value = [6, 0] 2286->2290 2288 entropy = 0.0 samples = 2 value = [0, 2] 2287->2288 2289 entropy = 1.0 samples = 2 value = [1, 1] 2287->2289 2292 total_of_special_requests <= 1.5 entropy = 0.99 samples = 25 value = [11, 14] 2291->2292 2307 entropy = 0.0 samples = 6 value = [6, 0] 2291->2307 2293 lead_time <= 28.5 entropy = 0.993 samples = 20 value = [11, 9] 2292->2293 2306 entropy = 0.0 samples = 5 value = [0, 5] 2292->2306 2294 assigned_room_type_D <= 0.5 entropy = 0.764 samples = 9 value = [2, 7] 2293->2294 2301 children <= 1.0 entropy = 0.684 samples = 11 value = [9, 2] 2293->2301 2295 adr <= 51.945 entropy = 0.544 samples = 8 value = [1, 7] 2294->2295 2300 entropy = 0.0 samples = 1 value = [1, 0] 2294->2300 2296 entropy = 0.0 samples = 4 value = [0, 4] 2295->2296 2297 assigned_room_type_E <= 0.5 entropy = 0.811 samples = 4 value = [1, 3] 2295->2297 2298 entropy = 1.0 samples = 2 value = [1, 1] 2297->2298 2299 entropy = 0.0 samples = 2 value = [0, 2] 2297->2299 2302 lead_time <= 79.5 entropy = 0.469 samples = 10 value = [9, 1] 2301->2302 2305 entropy = 0.0 samples = 1 value = [0, 1] 2301->2305 2303 entropy = 0.0 samples = 9 value = [9, 0] 2302->2303 2304 entropy = 0.0 samples = 1 value = [0, 1] 2302->2304 2309 arrival_date_day_of_month <= 27.5 entropy = 0.959 samples = 21 value = [8, 13] 2308->2309 2320 arrival_date_week_number <= 48.5 entropy = 0.855 samples = 25 value = [18, 7] 2308->2320 2310 arrival_date_day_of_month <= 25.5 entropy = 1.0 samples = 16 value = [8, 8] 2309->2310 2319 entropy = 0.0 samples = 5 value = [0, 5] 2309->2319 2311 total_of_special_requests <= 1.5 entropy = 0.918 samples = 12 value = [4, 8] 2310->2311 2318 entropy = 0.0 samples = 4 value = [4, 0] 2310->2318 2312 reserved_room_type_E <= 0.5 entropy = 0.544 samples = 8 value = [1, 7] 2311->2312 2315 arrival_date_day_of_month <= 21.5 entropy = 0.811 samples = 4 value = [3, 1] 2311->2315 2313 entropy = 0.0 samples = 6 value = [0, 6] 2312->2313 2314 entropy = 1.0 samples = 2 value = [1, 1] 2312->2314 2316 entropy = 0.0 samples = 3 value = [3, 0] 2315->2316 2317 entropy = 0.0 samples = 1 value = [0, 1] 2315->2317 2321 lead_time <= 50.0 entropy = 0.985 samples = 14 value = [8, 6] 2320->2321 2332 adr <= 92.25 entropy = 0.439 samples = 11 value = [10, 1] 2320->2332 2322 entropy = 0.0 samples = 3 value = [3, 0] 2321->2322 2323 arrival_date_month_November <= 0.5 entropy = 0.994 samples = 11 value = [5, 6] 2321->2323 2324 lead_time <= 242.0 entropy = 0.863 samples = 7 value = [5, 2] 2323->2324 2331 entropy = 0.0 samples = 4 value = [0, 4] 2323->2331 2325 lead_time <= 176.5 entropy = 0.65 samples = 6 value = [5, 1] 2324->2325 2330 entropy = 0.0 samples = 1 value = [0, 1] 2324->2330 2326 entropy = 0.0 samples = 3 value = [3, 0] 2325->2326 2327 arrival_date_week_number <= 43.5 entropy = 0.918 samples = 3 value = [2, 1] 2325->2327 2328 entropy = 0.0 samples = 2 value = [2, 0] 2327->2328 2329 entropy = 0.0 samples = 1 value = [0, 1] 2327->2329 2333 entropy = 0.0 samples = 8 value = [8, 0] 2332->2333 2334 adr <= 99.9 entropy = 0.918 samples = 3 value = [2, 1] 2332->2334 2335 entropy = 0.0 samples = 1 value = [0, 1] 2334->2335 2336 entropy = 0.0 samples = 2 value = [2, 0] 2334->2336 2338 arrival_date_week_number <= 37.5 entropy = 0.891 samples = 276 value = [85, 191] 2337->2338 2453 total_of_special_requests <= 1.5 entropy = 0.983 samples = 395 value = [167, 228] 2337->2453 2339 arrival_date_week_number <= 14.5 entropy = 0.744 samples = 180 value = [38, 142] 2338->2339 2408 stays_in_week_nights <= 1.5 entropy = 1.0 samples = 96 value = [47, 49] 2338->2408 2340 lead_time <= 29.0 entropy = 0.918 samples = 12 value = [8, 4] 2339->2340 2343 lead_time <= 26.0 entropy = 0.677 samples = 168 value = [30, 138] 2339->2343 2341 entropy = 0.0 samples = 4 value = [0, 4] 2340->2341 2342 entropy = 0.0 samples = 8 value = [8, 0] 2340->2342 2344 lead_time <= 23.0 entropy = 0.907 samples = 31 value = [10, 21] 2343->2344 2361 assigned_room_type_D <= 0.5 entropy = 0.6 samples = 137 value = [20, 117] 2343->2361 2345 assigned_room_type_A <= 0.5 entropy = 0.85 samples = 29 value = [8, 21] 2344->2345 2360 entropy = 0.0 samples = 2 value = [2, 0] 2344->2360 2346 lead_time <= 19.0 entropy = 0.523 samples = 17 value = [2, 15] 2345->2346 2353 arrival_date_day_of_month <= 20.0 entropy = 1.0 samples = 12 value = [6, 6] 2345->2353 2347 entropy = 0.0 samples = 10 value = [0, 10] 2346->2347 2348 arrival_date_day_of_month <= 17.0 entropy = 0.863 samples = 7 value = [2, 5] 2346->2348 2349 entropy = 0.0 samples = 1 value = [1, 0] 2348->2349 2350 arrival_date_month_June <= 0.5 entropy = 0.65 samples = 6 value = [1, 5] 2348->2350 2351 entropy = 0.0 samples = 5 value = [0, 5] 2350->2351 2352 entropy = 0.0 samples = 1 value = [1, 0] 2350->2352 2354 lead_time <= 12.5 entropy = 0.811 samples = 8 value = [2, 6] 2353->2354 2359 entropy = 0.0 samples = 4 value = [4, 0] 2353->2359 2355 stays_in_weekend_nights <= 0.5 entropy = 1.0 samples = 4 value = [2, 2] 2354->2355 2358 entropy = 0.0 samples = 4 value = [0, 4] 2354->2358 2356 entropy = 0.0 samples = 2 value = [2, 0] 2355->2356 2357 entropy = 0.0 samples = 2 value = [0, 2] 2355->2357 2362 lead_time <= 120.0 entropy = 0.418 samples = 71 value = [6, 65] 2361->2362 2381 lead_time <= 143.0 entropy = 0.746 samples = 66 value = [14, 52] 2361->2381 2363 total_of_special_requests <= 2.5 entropy = 0.559 samples = 46 value = [6, 40] 2362->2363 2380 entropy = 0.0 samples = 25 value = [0, 25] 2362->2380 2364 adr <= 118.98 entropy = 0.503 samples = 45 value = [5, 40] 2363->2364 2379 entropy = 0.0 samples = 1 value = [1, 0] 2363->2379 2365 entropy = 0.0 samples = 1 value = [1, 0] 2364->2365 2366 hotel_Resort Hotel <= 0.5 entropy = 0.439 samples = 44 value = [4, 40] 2364->2366 2367 arrival_date_day_of_month <= 28.0 entropy = 0.619 samples = 26 value = [4, 22] 2366->2367 2378 entropy = 0.0 samples = 18 value = [0, 18] 2366->2378 2368 meal_HB <= 0.5 entropy = 0.529 samples = 25 value = [3, 22] 2367->2368 2377 entropy = 0.0 samples = 1 value = [1, 0] 2367->2377 2369 arrival_date_week_number <= 34.5 entropy = 0.414 samples = 24 value = [2, 22] 2368->2369 2376 entropy = 0.0 samples = 1 value = [1, 0] 2368->2376 2370 entropy = 0.0 samples = 17 value = [0, 17] 2369->2370 2371 adr <= 194.85 entropy = 0.863 samples = 7 value = [2, 5] 2369->2371 2372 total_of_special_requests <= 1.5 entropy = 0.65 samples = 6 value = [1, 5] 2371->2372 2375 entropy = 0.0 samples = 1 value = [1, 0] 2371->2375 2373 entropy = 0.0 samples = 4 value = [0, 4] 2372->2373 2374 entropy = 1.0 samples = 2 value = [1, 1] 2372->2374 2382 adr <= 132.06 entropy = 0.598 samples = 55 value = [8, 47] 2381->2382 2401 adr <= 135.29 entropy = 0.994 samples = 11 value = [6, 5] 2381->2401 2383 entropy = 0.0 samples = 18 value = [0, 18] 2382->2383 2384 arrival_date_month_July <= 0.5 entropy = 0.753 samples = 37 value = [8, 29] 2382->2384 2385 arrival_date_day_of_month <= 25.5 entropy = 0.863 samples = 28 value = [8, 20] 2384->2385 2400 entropy = 0.0 samples = 9 value = [0, 9] 2384->2400 2386 arrival_date_day_of_month <= 19.5 entropy = 0.932 samples = 23 value = [8, 15] 2385->2386 2399 entropy = 0.0 samples = 5 value = [0, 5] 2385->2399 2387 arrival_date_day_of_month <= 9.0 entropy = 0.787 samples = 17 value = [4, 13] 2386->2387 2394 arrival_date_month_June <= 0.5 entropy = 0.918 samples = 6 value = [4, 2] 2386->2394 2388 lead_time <= 67.0 entropy = 0.991 samples = 9 value = [4, 5] 2387->2388 2393 entropy = 0.0 samples = 8 value = [0, 8] 2387->2393 2389 entropy = 0.0 samples = 3 value = [3, 0] 2388->2389 2390 arrival_date_month_May <= 0.5 entropy = 0.65 samples = 6 value = [1, 5] 2388->2390 2391 entropy = 0.0 samples = 5 value = [0, 5] 2390->2391 2392 entropy = 0.0 samples = 1 value = [1, 0] 2390->2392 2395 arrival_date_month_May <= 0.5 entropy = 0.722 samples = 5 value = [4, 1] 2394->2395 2398 entropy = 0.0 samples = 1 value = [0, 1] 2394->2398 2396 entropy = 0.0 samples = 3 value = [3, 0] 2395->2396 2397 entropy = 1.0 samples = 2 value = [1, 1] 2395->2397 2402 entropy = 0.0 samples = 4 value = [4, 0] 2401->2402 2403 arrival_date_month_June <= 0.5 entropy = 0.863 samples = 7 value = [2, 5] 2401->2403 2404 arrival_date_day_of_month <= 25.5 entropy = 0.65 samples = 6 value = [1, 5] 2403->2404 2407 entropy = 0.0 samples = 1 value = [1, 0] 2403->2407 2405 entropy = 0.0 samples = 5 value = [0, 5] 2404->2405 2406 entropy = 0.0 samples = 1 value = [1, 0] 2404->2406 2409 arrival_date_week_number <= 47.5 entropy = 0.907 samples = 31 value = [21, 10] 2408->2409 2420 arrival_date_week_number <= 43.5 entropy = 0.971 samples = 65 value = [26, 39] 2408->2420 2410 arrival_date_day_of_month <= 6.5 entropy = 0.971 samples = 25 value = [15, 10] 2409->2410 2419 entropy = 0.0 samples = 6 value = [6, 0] 2409->2419 2411 entropy = 0.0 samples = 5 value = [5, 0] 2410->2411 2412 adr <= 165.315 entropy = 1.0 samples = 20 value = [10, 10] 2410->2412 2413 lead_time <= 38.0 entropy = 0.918 samples = 15 value = [10, 5] 2412->2413 2418 entropy = 0.0 samples = 5 value = [0, 5] 2412->2418 2414 adr <= 158.25 entropy = 0.954 samples = 8 value = [3, 5] 2413->2414 2417 entropy = 0.0 samples = 7 value = [7, 0] 2413->2417 2415 entropy = 0.0 samples = 5 value = [0, 5] 2414->2415 2416 entropy = 0.0 samples = 3 value = [3, 0] 2414->2416 2421 adr <= 174.6 entropy = 0.998 samples = 40 value = [21, 19] 2420->2421 2444 booking_changes <= 2.0 entropy = 0.722 samples = 25 value = [5, 20] 2420->2444 2422 adr <= 149.2 entropy = 0.998 samples = 36 value = [17, 19] 2421->2422 2443 entropy = 0.0 samples = 4 value = [4, 0] 2421->2443 2423 lead_time <= 154.0 entropy = 0.918 samples = 21 value = [14, 7] 2422->2423 2432 arrival_date_week_number <= 39.5 entropy = 0.722 samples = 15 value = [3, 12] 2422->2432 2424 total_of_special_requests <= 1.5 entropy = 0.764 samples = 18 value = [14, 4] 2423->2424 2431 entropy = 0.0 samples = 3 value = [0, 3] 2423->2431 2425 arrival_date_day_of_month <= 21.5 entropy = 1.0 samples = 8 value = [4, 4] 2424->2425 2430 entropy = 0.0 samples = 10 value = [10, 0] 2424->2430 2426 arrival_date_week_number <= 38.5 entropy = 0.722 samples = 5 value = [4, 1] 2425->2426 2429 entropy = 0.0 samples = 3 value = [0, 3] 2425->2429 2427 entropy = 0.0 samples = 1 value = [0, 1] 2426->2427 2428 entropy = 0.0 samples = 4 value = [4, 0] 2426->2428 2433 entropy = 0.0 samples = 6 value = [0, 6] 2432->2433 2434 assigned_room_type_D <= 0.5 entropy = 0.918 samples = 9 value = [3, 6] 2432->2434 2435 entropy = 0.0 samples = 3 value = [0, 3] 2434->2435 2436 arrival_date_week_number <= 40.5 entropy = 1.0 samples = 6 value = [3, 3] 2434->2436 2437 entropy = 0.0 samples = 1 value = [1, 0] 2436->2437 2438 lead_time <= 136.0 entropy = 0.971 samples = 5 value = [2, 3] 2436->2438 2439 lead_time <= 69.0 entropy = 0.811 samples = 4 value = [1, 3] 2438->2439 2442 entropy = 0.0 samples = 1 value = [1, 0] 2438->2442 2440 entropy = 1.0 samples = 2 value = [1, 1] 2439->2440 2441 entropy = 0.0 samples = 2 value = [0, 2] 2439->2441 2445 stays_in_weekend_nights <= 0.5 entropy = 0.559 samples = 23 value = [3, 20] 2444->2445 2452 entropy = 0.0 samples = 2 value = [2, 0] 2444->2452 2446 adr <= 130.45 entropy = 0.985 samples = 7 value = [3, 4] 2445->2446 2451 entropy = 0.0 samples = 16 value = [0, 16] 2445->2451 2447 entropy = 0.0 samples = 2 value = [2, 0] 2446->2447 2448 arrival_date_month_December <= 0.5 entropy = 0.722 samples = 5 value = [1, 4] 2446->2448 2449 entropy = 0.0 samples = 3 value = [0, 3] 2448->2449 2450 entropy = 1.0 samples = 2 value = [1, 1] 2448->2450 2454 arrival_date_week_number <= 15.5 entropy = 0.999 samples = 245 value = [119, 126] 2453->2454 2587 stays_in_weekend_nights <= 1.5 entropy = 0.904 samples = 150 value = [48, 102] 2453->2587 2455 assigned_room_type_E <= 0.5 entropy = 0.7 samples = 37 value = [7, 30] 2454->2455 2470 adr <= 183.665 entropy = 0.996 samples = 208 value = [112, 96] 2454->2470 2456 adr <= 119.5 entropy = 0.592 samples = 35 value = [5, 30] 2455->2456 2469 entropy = 0.0 samples = 2 value = [2, 0] 2455->2469 2457 entropy = 0.0 samples = 1 value = [1, 0] 2456->2457 2458 lead_time <= 268.5 entropy = 0.523 samples = 34 value = [4, 30] 2456->2458 2459 assigned_room_type_A <= 0.5 entropy = 0.439 samples = 33 value = [3, 30] 2458->2459 2468 entropy = 0.0 samples = 1 value = [1, 0] 2458->2468 2460 entropy = 0.0 samples = 20 value = [0, 20] 2459->2460 2461 stays_in_weekend_nights <= 1.5 entropy = 0.779 samples = 13 value = [3, 10] 2459->2461 2462 adr <= 144.6 entropy = 0.469 samples = 10 value = [1, 9] 2461->2462 2465 arrival_date_month_January <= 0.5 entropy = 0.918 samples = 3 value = [2, 1] 2461->2465 2463 entropy = 0.0 samples = 9 value = [0, 9] 2462->2463 2464 entropy = 0.0 samples = 1 value = [1, 0] 2462->2464 2466 entropy = 0.0 samples = 2 value = [2, 0] 2465->2466 2467 entropy = 0.0 samples = 1 value = [0, 1] 2465->2467 2471 adr <= 127.9 entropy = 0.997 samples = 148 value = [69, 79] 2470->2471 2560 stays_in_week_nights <= 3.5 entropy = 0.86 samples = 60 value = [43, 17] 2470->2560 2472 lead_time <= 202.5 entropy = 0.949 samples = 38 value = [24, 14] 2471->2472 2497 adr <= 129.96 entropy = 0.976 samples = 110 value = [45, 65] 2471->2497 2473 assigned_room_type_F <= 0.5 entropy = 0.983 samples = 33 value = [19, 14] 2472->2473 2496 entropy = 0.0 samples = 5 value = [5, 0] 2472->2496 2474 lead_time <= 144.5 entropy = 0.963 samples = 31 value = [19, 12] 2473->2474 2495 entropy = 0.0 samples = 2 value = [0, 2] 2473->2495 2475 arrival_date_week_number <= 21.5 entropy = 0.871 samples = 24 value = [17, 7] 2474->2475 2490 lead_time <= 159.0 entropy = 0.863 samples = 7 value = [2, 5] 2474->2490 2476 entropy = 0.0 samples = 7 value = [7, 0] 2475->2476 2477 lead_time <= 117.5 entropy = 0.977 samples = 17 value = [10, 7] 2475->2477 2478 lead_time <= 107.0 entropy = 1.0 samples = 14 value = [7, 7] 2477->2478 2489 entropy = 0.0 samples = 3 value = [3, 0] 2477->2489 2479 arrival_date_month_July <= 0.5 entropy = 0.946 samples = 11 value = [7, 4] 2478->2479 2488 entropy = 0.0 samples = 3 value = [0, 3] 2478->2488 2480 entropy = 0.0 samples = 4 value = [4, 0] 2479->2480 2481 lead_time <= 81.5 entropy = 0.985 samples = 7 value = [3, 4] 2479->2481 2482 entropy = 0.0 samples = 2 value = [0, 2] 2481->2482 2483 arrival_date_day_of_month <= 20.0 entropy = 0.971 samples = 5 value = [3, 2] 2481->2483 2484 entropy = 0.0 samples = 2 value = [2, 0] 2483->2484 2485 stays_in_week_nights <= 1.5 entropy = 0.918 samples = 3 value = [1, 2] 2483->2485 2486 entropy = 0.0 samples = 2 value = [0, 2] 2485->2486 2487 entropy = 0.0 samples = 1 value = [1, 0] 2485->2487 2491 entropy = 0.0 samples = 4 value = [0, 4] 2490->2491 2492 adr <= 123.0 entropy = 0.918 samples = 3 value = [2, 1] 2490->2492 2493 entropy = 0.0 samples = 1 value = [0, 1] 2492->2493 2494 entropy = 0.0 samples = 2 value = [2, 0] 2492->2494 2498 entropy = 0.0 samples = 4 value = [0, 4] 2497->2498 2499 adr <= 130.25 entropy = 0.984 samples = 106 value = [45, 61] 2497->2499 2500 entropy = 0.0 samples = 3 value = [3, 0] 2499->2500 2501 adr <= 132.005 entropy = 0.975 samples = 103 value = [42, 61] 2499->2501 2502 entropy = 0.0 samples = 5 value = [0, 5] 2501->2502 2503 arrival_date_day_of_month <= 5.5 entropy = 0.985 samples = 98 value = [42, 56] 2501->2503 2504 arrival_date_month_August <= 0.5 entropy = 0.795 samples = 25 value = [6, 19] 2503->2504 2519 arrival_date_week_number <= 22.5 entropy = 1.0 samples = 73 value = [36, 37] 2503->2519 2505 assigned_room_type_D <= 0.5 entropy = 0.9 samples = 19 value = [6, 13] 2504->2505 2518 entropy = 0.0 samples = 6 value = [0, 6] 2504->2518 2506 stays_in_weekend_nights <= 2.5 entropy = 0.954 samples = 16 value = [6, 10] 2505->2506 2517 entropy = 0.0 samples = 3 value = [0, 3] 2505->2517 2507 stays_in_week_nights <= 2.5 entropy = 0.996 samples = 13 value = [6, 7] 2506->2507 2516 entropy = 0.0 samples = 3 value = [0, 3] 2506->2516 2508 reserved_room_type_A <= 0.5 entropy = 0.65 samples = 6 value = [1, 5] 2507->2508 2511 lead_time <= 55.0 entropy = 0.863 samples = 7 value = [5, 2] 2507->2511 2509 entropy = 0.0 samples = 1 value = [1, 0] 2508->2509 2510 entropy = 0.0 samples = 5 value = [0, 5] 2508->2510 2512 entropy = 0.0 samples = 4 value = [4, 0] 2511->2512 2513 lead_time <= 162.5 entropy = 0.918 samples = 3 value = [1, 2] 2511->2513 2514 entropy = 0.0 samples = 2 value = [0, 2] 2513->2514 2515 entropy = 0.0 samples = 1 value = [1, 0] 2513->2515 2520 stays_in_weekend_nights <= 1.5 entropy = 0.811 samples = 16 value = [4, 12] 2519->2520 2529 adr <= 168.5 entropy = 0.989 samples = 57 value = [32, 25] 2519->2529 2521 meal_BB <= 0.5 entropy = 0.439 samples = 11 value = [1, 10] 2520->2521 2524 arrival_date_day_of_month <= 21.5 entropy = 0.971 samples = 5 value = [3, 2] 2520->2524 2522 entropy = 1.0 samples = 2 value = [1, 1] 2521->2522 2523 entropy = 0.0 samples = 9 value = [0, 9] 2521->2523 2525 lead_time <= 73.5 entropy = 0.918 samples = 3 value = [1, 2] 2524->2525 2528 entropy = 0.0 samples = 2 value = [2, 0] 2524->2528 2526 entropy = 0.0 samples = 2 value = [0, 2] 2525->2526 2527 entropy = 0.0 samples = 1 value = [1, 0] 2525->2527 2530 meal_BB <= 0.5 entropy = 0.954 samples = 48 value = [30, 18] 2529->2530 2555 lead_time <= 142.0 entropy = 0.764 samples = 9 value = [2, 7] 2529->2555 2531 lead_time <= 15.5 entropy = 0.65 samples = 12 value = [10, 2] 2530->2531 2536 stays_in_week_nights <= 4.5 entropy = 0.991 samples = 36 value = [20, 16] 2530->2536 2532 entropy = 0.0 samples = 1 value = [0, 1] 2531->2532 2533 stays_in_week_nights <= 3.5 entropy = 0.439 samples = 11 value = [10, 1] 2531->2533 2534 entropy = 0.0 samples = 10 value = [10, 0] 2533->2534 2535 entropy = 0.0 samples = 1 value = [0, 1] 2533->2535 2537 lead_time <= 66.5 entropy = 0.999 samples = 29 value = [14, 15] 2536->2537 2552 arrival_date_week_number <= 29.0 entropy = 0.592 samples = 7 value = [6, 1] 2536->2552 2538 arrival_date_day_of_month <= 12.5 entropy = 0.918 samples = 15 value = [10, 5] 2537->2538 2547 arrival_date_month_June <= 0.5 entropy = 0.863 samples = 14 value = [4, 10] 2537->2547 2539 adr <= 152.25 entropy = 0.991 samples = 9 value = [4, 5] 2538->2539 2546 entropy = 0.0 samples = 6 value = [6, 0] 2538->2546 2540 arrival_date_week_number <= 23.5 entropy = 0.918 samples = 6 value = [4, 2] 2539->2540 2545 entropy = 0.0 samples = 3 value = [0, 3] 2539->2545 2541 entropy = 0.0 samples = 3 value = [3, 0] 2540->2541 2542 assigned_room_type_D <= 0.5 entropy = 0.918 samples = 3 value = [1, 2] 2540->2542 2543 entropy = 0.0 samples = 2 value = [0, 2] 2542->2543 2544 entropy = 0.0 samples = 1 value = [1, 0] 2542->2544 2548 lead_time <= 210.5 entropy = 0.65 samples = 12 value = [2, 10] 2547->2548 2551 entropy = 0.0 samples = 2 value = [2, 0] 2547->2551 2549 entropy = 0.0 samples = 10 value = [0, 10] 2548->2549 2550 entropy = 0.0 samples = 2 value = [2, 0] 2548->2550 2553 entropy = 0.0 samples = 1 value = [0, 1] 2552->2553 2554 entropy = 0.0 samples = 6 value = [6, 0] 2552->2554 2556 arrival_date_week_number <= 32.5 entropy = 0.544 samples = 8 value = [1, 7] 2555->2556 2559 entropy = 0.0 samples = 1 value = [1, 0] 2555->2559 2557 entropy = 0.0 samples = 6 value = [0, 6] 2556->2557 2558 entropy = 1.0 samples = 2 value = [1, 1] 2556->2558 2561 adr <= 303.4 entropy = 0.931 samples = 49 value = [32, 17] 2560->2561 2586 entropy = 0.0 samples = 11 value = [11, 0] 2560->2586 2562 adr <= 229.445 entropy = 0.887 samples = 46 value = [32, 14] 2561->2562 2585 entropy = 0.0 samples = 3 value = [0, 3] 2561->2585 2563 stays_in_week_nights <= 1.5 entropy = 0.957 samples = 37 value = [23, 14] 2562->2563 2584 entropy = 0.0 samples = 9 value = [9, 0] 2562->2584 2564 adr <= 196.335 entropy = 0.946 samples = 11 value = [4, 7] 2563->2564 2571 adr <= 211.0 entropy = 0.84 samples = 26 value = [19, 7] 2563->2571 2565 entropy = 0.0 samples = 4 value = [0, 4] 2564->2565 2566 lead_time <= 61.5 entropy = 0.985 samples = 7 value = [4, 3] 2564->2566 2567 reserved_room_type_C <= 0.5 entropy = 0.722 samples = 5 value = [4, 1] 2566->2567 2570 entropy = 0.0 samples = 2 value = [0, 2] 2566->2570 2568 entropy = 0.0 samples = 4 value = [4, 0] 2567->2568 2569 entropy = 0.0 samples = 1 value = [0, 1] 2567->2569 2572 arrival_date_day_of_month <= 15.5 entropy = 0.61 samples = 20 value = [17, 3] 2571->2572 2579 meal_BB <= 0.5 entropy = 0.918 samples = 6 value = [2, 4] 2571->2579 2573 entropy = 0.0 samples = 10 value = [10, 0] 2572->2573 2574 booking_changes <= 0.5 entropy = 0.881 samples = 10 value = [7, 3] 2572->2574 2575 stays_in_weekend_nights <= 1.5 entropy = 0.544 samples = 8 value = [7, 1] 2574->2575 2578 entropy = 0.0 samples = 2 value = [0, 2] 2574->2578 2576 entropy = 0.0 samples = 7 value = [7, 0] 2575->2576 2577 entropy = 0.0 samples = 1 value = [0, 1] 2575->2577 2580 entropy = 0.0 samples = 3 value = [0, 3] 2579->2580 2581 lead_time <= 16.0 entropy = 0.918 samples = 3 value = [2, 1] 2579->2581 2582 entropy = 0.0 samples = 2 value = [2, 0] 2581->2582 2583 entropy = 0.0 samples = 1 value = [0, 1] 2581->2583 2588 arrival_date_week_number <= 31.5 entropy = 0.794 samples = 92 value = [22, 70] 2587->2588 2623 lead_time <= 139.5 entropy = 0.992 samples = 58 value = [26, 32] 2587->2623 2589 adr <= 164.915 entropy = 0.863 samples = 77 value = [22, 55] 2588->2589 2622 entropy = 0.0 samples = 15 value = [0, 15] 2588->2622 2590 lead_time <= 32.5 entropy = 0.744 samples = 52 value = [11, 41] 2589->2590 2611 adr <= 172.5 entropy = 0.99 samples = 25 value = [11, 14] 2589->2611 2591 entropy = 0.0 samples = 13 value = [0, 13] 2590->2591 2592 arrival_date_month_February <= 0.5 entropy = 0.858 samples = 39 value = [11, 28] 2590->2592 2593 booking_changes <= 1.0 entropy = 0.8 samples = 37 value = [9, 28] 2592->2593 2610 entropy = 0.0 samples = 2 value = [2, 0] 2592->2610 2594 arrival_date_week_number <= 19.5 entropy = 0.684 samples = 33 value = [6, 27] 2593->2594 2607 stays_in_week_nights <= 2.5 entropy = 0.811 samples = 4 value = [3, 1] 2593->2607 2595 entropy = 0.0 samples = 13 value = [0, 13] 2594->2595 2596 arrival_date_day_of_month <= 9.0 entropy = 0.881 samples = 20 value = [6, 14] 2594->2596 2597 entropy = 0.0 samples = 4 value = [0, 4] 2596->2597 2598 arrival_date_day_of_month <= 11.5 entropy = 0.954 samples = 16 value = [6, 10] 2596->2598 2599 entropy = 0.0 samples = 2 value = [2, 0] 2598->2599 2600 lead_time <= 223.0 entropy = 0.863 samples = 14 value = [4, 10] 2598->2600 2601 total_of_special_requests <= 2.5 entropy = 0.469 samples = 10 value = [1, 9] 2600->2601 2604 stays_in_week_nights <= 3.5 entropy = 0.811 samples = 4 value = [3, 1] 2600->2604 2602 entropy = 0.0 samples = 8 value = [0, 8] 2601->2602 2603 entropy = 1.0 samples = 2 value = [1, 1] 2601->2603 2605 entropy = 0.0 samples = 3 value = [3, 0] 2604->2605 2606 entropy = 0.0 samples = 1 value = [0, 1] 2604->2606 2608 entropy = 0.0 samples = 3 value = [3, 0] 2607->2608 2609 entropy = 0.0 samples = 1 value = [0, 1] 2607->2609 2612 entropy = 0.0 samples = 5 value = [5, 0] 2611->2612 2613 lead_time <= 23.5 entropy = 0.881 samples = 20 value = [6, 14] 2611->2613 2614 adults <= 1.5 entropy = 0.722 samples = 5 value = [4, 1] 2613->2614 2617 adults <= 1.5 entropy = 0.567 samples = 15 value = [2, 13] 2613->2617 2615 entropy = 0.0 samples = 1 value = [0, 1] 2614->2615 2616 entropy = 0.0 samples = 4 value = [4, 0] 2614->2616 2618 entropy = 0.0 samples = 1 value = [1, 0] 2617->2618 2619 arrival_date_day_of_month <= 29.0 entropy = 0.371 samples = 14 value = [1, 13] 2617->2619 2620 entropy = 0.0 samples = 12 value = [0, 12] 2619->2620 2621 entropy = 1.0 samples = 2 value = [1, 1] 2619->2621 2624 arrival_date_month_June <= 0.5 entropy = 0.971 samples = 35 value = [21, 14] 2623->2624 2649 lead_time <= 198.0 entropy = 0.755 samples = 23 value = [5, 18] 2623->2649 2625 booking_changes <= 0.5 entropy = 0.918 samples = 30 value = [20, 10] 2624->2625 2646 booking_changes <= 1.0 entropy = 0.722 samples = 5 value = [1, 4] 2624->2646 2626 adults <= 2.5 entropy = 0.826 samples = 27 value = [20, 7] 2625->2626 2645 entropy = 0.0 samples = 3 value = [0, 3] 2625->2645 2627 lead_time <= 132.5 entropy = 0.61 samples = 20 value = [17, 3] 2626->2627 2638 lead_time <= 62.5 entropy = 0.985 samples = 7 value = [3, 4] 2626->2638 2628 stays_in_week_nights <= 2.5 entropy = 0.485 samples = 19 value = [17, 2] 2627->2628 2637 entropy = 0.0 samples = 1 value = [0, 1] 2627->2637 2629 stays_in_week_nights <= 1.5 entropy = 0.764 samples = 9 value = [7, 2] 2628->2629 2636 entropy = 0.0 samples = 10 value = [10, 0] 2628->2636 2630 entropy = 0.0 samples = 4 value = [4, 0] 2629->2630 2631 arrival_date_day_of_month <= 13.0 entropy = 0.971 samples = 5 value = [3, 2] 2629->2631 2632 entropy = 0.0 samples = 2 value = [2, 0] 2631->2632 2633 reserved_room_type_D <= 0.5 entropy = 0.918 samples = 3 value = [1, 2] 2631->2633 2634 entropy = 0.0 samples = 2 value = [0, 2] 2633->2634 2635 entropy = 0.0 samples = 1 value = [1, 0] 2633->2635 2639 entropy = 0.0 samples = 2 value = [0, 2] 2638->2639 2640 lead_time <= 101.0 entropy = 0.971 samples = 5 value = [3, 2] 2638->2640 2641 entropy = 0.0 samples = 2 value = [2, 0] 2640->2641 2642 meal_HB <= 0.5 entropy = 0.918 samples = 3 value = [1, 2] 2640->2642 2643 entropy = 0.0 samples = 2 value = [0, 2] 2642->2643 2644 entropy = 0.0 samples = 1 value = [1, 0] 2642->2644 2647 entropy = 0.0 samples = 4 value = [0, 4] 2646->2647 2648 entropy = 0.0 samples = 1 value = [1, 0] 2646->2648 2650 entropy = 0.0 samples = 11 value = [0, 11] 2649->2650 2651 meal_BB <= 0.5 entropy = 0.98 samples = 12 value = [5, 7] 2649->2651 2652 entropy = 0.0 samples = 4 value = [0, 4] 2651->2652 2653 lead_time <= 244.0 entropy = 0.954 samples = 8 value = [5, 3] 2651->2653 2654 entropy = 0.0 samples = 4 value = [4, 0] 2653->2654 2655 stays_in_week_nights <= 6.5 entropy = 0.811 samples = 4 value = [1, 3] 2653->2655 2656 entropy = 0.0 samples = 3 value = [0, 3] 2655->2656 2657 entropy = 0.0 samples = 1 value = [1, 0] 2655->2657 2659 hotel_Resort Hotel <= 0.5 entropy = 0.995 samples = 444 value = [203, 241] 2658->2659 2854 lead_time <= 219.0 entropy = 0.513 samples = 35 value = [31, 4] 2658->2854 2660 assigned_room_type_A <= 0.5 entropy = 0.892 samples = 194 value = [60, 134] 2659->2660 2739 customer_type_Transient-Party <= 0.5 entropy = 0.985 samples = 250 value = [143, 107] 2659->2739 2661 reserved_room_type_G <= 0.5 entropy = 0.69 samples = 65 value = [12, 53] 2660->2661 2682 arrival_date_year <= 2015.5 entropy = 0.952 samples = 129 value = [48, 81] 2660->2682 2662 arrival_date_year <= 2016.5 entropy = 0.598 samples = 62 value = [9, 53] 2661->2662 2681 entropy = 0.0 samples = 3 value = [3, 0] 2661->2681 2663 arrival_date_week_number <= 39.0 entropy = 0.408 samples = 49 value = [4, 45] 2662->2663 2672 assigned_room_type_D <= 0.5 entropy = 0.961 samples = 13 value = [5, 8] 2662->2672 2664 entropy = 0.0 samples = 29 value = [0, 29] 2663->2664 2665 adr <= 117.455 entropy = 0.722 samples = 20 value = [4, 16] 2663->2665 2666 customer_type_Transient-Party <= 0.5 entropy = 0.353 samples = 15 value = [1, 14] 2665->2666 2669 children <= 0.5 entropy = 0.971 samples = 5 value = [3, 2] 2665->2669 2667 entropy = 0.0 samples = 14 value = [0, 14] 2666->2667 2668 entropy = 0.0 samples = 1 value = [1, 0] 2666->2668 2670 entropy = 0.0 samples = 3 value = [3, 0] 2669->2670 2671 entropy = 0.0 samples = 2 value = [0, 2] 2669->2671 2673 entropy = 0.0 samples = 4 value = [0, 4] 2672->2673 2674 arrival_date_day_of_month <= 27.5 entropy = 0.991 samples = 9 value = [5, 4] 2672->2674 2675 arrival_date_week_number <= 1.5 entropy = 0.863 samples = 7 value = [5, 2] 2674->2675 2680 entropy = 0.0 samples = 2 value = [0, 2] 2674->2680 2676 entropy = 0.0 samples = 1 value = [0, 1] 2675->2676 2677 arrival_date_month_July <= 0.5 entropy = 0.65 samples = 6 value = [5, 1] 2675->2677 2678 entropy = 0.0 samples = 5 value = [5, 0] 2677->2678 2679 entropy = 0.0 samples = 1 value = [0, 1] 2677->2679 2683 arrival_date_week_number <= 38.5 entropy = 0.968 samples = 38 value = [23, 15] 2682->2683 2704 arrival_date_year <= 2016.5 entropy = 0.848 samples = 91 value = [25, 66] 2682->2704 2684 arrival_date_day_of_month <= 12.5 entropy = 0.65 samples = 18 value = [15, 3] 2683->2684 2689 lead_time <= 27.0 entropy = 0.971 samples = 20 value = [8, 12] 2683->2689 2685 customer_type_Transient <= 0.5 entropy = 0.811 samples = 4 value = [1, 3] 2684->2685 2688 entropy = 0.0 samples = 14 value = [14, 0] 2684->2688 2686 entropy = 0.0 samples = 3 value = [0, 3] 2685->2686 2687 entropy = 0.0 samples = 1 value = [1, 0] 2685->2687 2690 entropy = 0.0 samples = 4 value = [0, 4] 2689->2690 2691 stays_in_week_nights <= 1.5 entropy = 1.0 samples = 16 value = [8, 8] 2689->2691 2692 entropy = 0.0 samples = 3 value = [0, 3] 2691->2692 2693 customer_type_Contract <= 0.5 entropy = 0.961 samples = 13 value = [8, 5] 2691->2693 2694 lead_time <= 31.0 entropy = 0.811 samples = 4 value = [1, 3] 2693->2694 2697 arrival_date_month_November <= 0.5 entropy = 0.764 samples = 9 value = [7, 2] 2693->2697 2695 entropy = 0.0 samples = 1 value = [1, 0] 2694->2695 2696 entropy = 0.0 samples = 3 value = [0, 3] 2694->2696 2698 entropy = 0.0 samples = 5 value = [5, 0] 2697->2698 2699 stays_in_week_nights <= 2.5 entropy = 1.0 samples = 4 value = [2, 2] 2697->2699 2700 entropy = 0.0 samples = 1 value = [0, 1] 2699->2700 2701 stays_in_week_nights <= 4.5 entropy = 0.918 samples = 3 value = [2, 1] 2699->2701 2702 entropy = 0.0 samples = 2 value = [2, 0] 2701->2702 2703 entropy = 0.0 samples = 1 value = [0, 1] 2701->2703 2705 arrival_date_month_June <= 0.5 entropy = 0.716 samples = 66 value = [13, 53] 2704->2705 2728 adr <= 88.5 entropy = 0.999 samples = 25 value = [12, 13] 2704->2728 2706 arrival_date_day_of_month <= 24.5 entropy = 0.796 samples = 54 value = [13, 41] 2705->2706 2727 entropy = 0.0 samples = 12 value = [0, 12] 2705->2727 2707 arrival_date_day_of_month <= 11.5 entropy = 0.641 samples = 43 value = [7, 36] 2706->2707 2720 lead_time <= 20.0 entropy = 0.994 samples = 11 value = [6, 5] 2706->2720 2708 adr <= 79.16 entropy = 0.902 samples = 22 value = [7, 15] 2707->2708 2719 entropy = 0.0 samples = 21 value = [0, 21] 2707->2719 2709 entropy = 0.0 samples = 3 value = [3, 0] 2708->2709 2710 customer_type_Contract <= 0.5 entropy = 0.742 samples = 19 value = [4, 15] 2708->2710 2711 arrival_date_day_of_month <= 6.5 entropy = 0.65 samples = 18 value = [3, 15] 2710->2711 2718 entropy = 0.0 samples = 1 value = [1, 0] 2710->2718 2712 entropy = 0.0 samples = 9 value = [0, 9] 2711->2712 2713 arrival_date_day_of_month <= 8.5 entropy = 0.918 samples = 9 value = [3, 6] 2711->2713 2714 entropy = 0.0 samples = 2 value = [2, 0] 2713->2714 2715 adr <= 87.09 entropy = 0.592 samples = 7 value = [1, 6] 2713->2715 2716 entropy = 0.0 samples = 1 value = [1, 0] 2715->2716 2717 entropy = 0.0 samples = 6 value = [0, 6] 2715->2717 2721 entropy = 0.0 samples = 3 value = [0, 3] 2720->2721 2722 lead_time <= 195.0 entropy = 0.811 samples = 8 value = [6, 2] 2720->2722 2723 stays_in_week_nights <= 0.5 entropy = 0.592 samples = 7 value = [6, 1] 2722->2723 2726 entropy = 0.0 samples = 1 value = [0, 1] 2722->2726 2724 entropy = 1.0 samples = 2 value = [1, 1] 2723->2724 2725 entropy = 0.0 samples = 5 value = [5, 0] 2723->2725 2729 entropy = 0.0 samples = 4 value = [0, 4] 2728->2729 2730 adr <= 140.645 entropy = 0.985 samples = 21 value = [12, 9] 2728->2730 2731 lead_time <= 16.0 entropy = 0.918 samples = 18 value = [12, 6] 2730->2731 2738 entropy = 0.0 samples = 3 value = [0, 3] 2730->2738 2732 entropy = 0.0 samples = 3 value = [0, 3] 2731->2732 2733 lead_time <= 75.5 entropy = 0.722 samples = 15 value = [12, 3] 2731->2733 2734 entropy = 0.0 samples = 10 value = [10, 0] 2733->2734 2735 lead_time <= 240.5 entropy = 0.971 samples = 5 value = [2, 3] 2733->2735 2736 entropy = 0.0 samples = 3 value = [0, 3] 2735->2736 2737 entropy = 0.0 samples = 2 value = [2, 0] 2735->2737 2740 lead_time <= 218.5 entropy = 0.975 samples = 241 value = [143, 98] 2739->2740 2853 entropy = 0.0 samples = 9 value = [0, 9] 2739->2853 2741 lead_time <= 169.5 entropy = 0.948 samples = 213 value = [135, 78] 2740->2741 2840 lead_time <= 252.0 entropy = 0.863 samples = 28 value = [8, 20] 2740->2840 2742 arrival_date_week_number <= 25.5 entropy = 0.975 samples = 187 value = [111, 76] 2741->2742 2835 arrival_date_day_of_month <= 4.0 entropy = 0.391 samples = 26 value = [24, 2] 2741->2835 2743 lead_time <= 13.5 entropy = 0.827 samples = 73 value = [54, 19] 2742->2743 2768 stays_in_weekend_nights <= 2.5 entropy = 1.0 samples = 114 value = [57, 57] 2742->2768 2744 entropy = 0.0 samples = 5 value = [0, 5] 2743->2744 2745 lead_time <= 69.5 entropy = 0.734 samples = 68 value = [54, 14] 2743->2745 2746 reserved_room_type_A <= 0.5 entropy = 0.909 samples = 37 value = [25, 12] 2745->2746 2763 lead_time <= 144.5 entropy = 0.345 samples = 31 value = [29, 2] 2745->2763 2747 arrival_date_month_April <= 0.5 entropy = 0.567 samples = 15 value = [13, 2] 2746->2747 2754 assigned_room_type_A <= 0.5 entropy = 0.994 samples = 22 value = [12, 10] 2746->2754 2748 entropy = 0.0 samples = 10 value = [10, 0] 2747->2748 2749 lead_time <= 44.5 entropy = 0.971 samples = 5 value = [3, 2] 2747->2749 2750 adr <= 78.13 entropy = 0.918 samples = 3 value = [1, 2] 2749->2750 2753 entropy = 0.0 samples = 2 value = [2, 0] 2749->2753 2751 entropy = 0.0 samples = 1 value = [1, 0] 2750->2751 2752 entropy = 0.0 samples = 2 value = [0, 2] 2750->2752 2755 entropy = 0.0 samples = 5 value = [0, 5] 2754->2755 2756 arrival_date_day_of_month <= 12.5 entropy = 0.874 samples = 17 value = [12, 5] 2754->2756 2757 stays_in_weekend_nights <= 0.5 entropy = 0.991 samples = 9 value = [4, 5] 2756->2757 2762 entropy = 0.0 samples = 8 value = [8, 0] 2756->2762 2758 entropy = 0.0 samples = 3 value = [3, 0] 2757->2758 2759 lead_time <= 19.5 entropy = 0.65 samples = 6 value = [1, 5] 2757->2759 2760 entropy = 0.0 samples = 1 value = [1, 0] 2759->2760 2761 entropy = 0.0 samples = 5 value = [0, 5] 2759->2761 2764 entropy = 0.0 samples = 25 value = [25, 0] 2763->2764 2765 arrival_date_week_number <= 23.5 entropy = 0.918 samples = 6 value = [4, 2] 2763->2765 2766 entropy = 0.0 samples = 2 value = [0, 2] 2765->2766 2767 entropy = 0.0 samples = 4 value = [4, 0] 2765->2767 2769 booking_changes <= 0.5 entropy = 0.998 samples = 108 value = [51, 57] 2768->2769 2834 entropy = 0.0 samples = 6 value = [6, 0] 2768->2834 2770 total_of_special_requests <= 2.5 entropy = 0.999 samples = 88 value = [46, 42] 2769->2770 2823 adr <= 158.835 entropy = 0.811 samples = 20 value = [5, 15] 2769->2823 2771 adr <= 16.2 entropy = 0.989 samples = 82 value = [46, 36] 2770->2771 2822 entropy = 0.0 samples = 6 value = [0, 6] 2770->2822 2772 entropy = 0.0 samples = 3 value = [0, 3] 2771->2772 2773 adr <= 45.7 entropy = 0.98 samples = 79 value = [46, 33] 2771->2773 2774 arrival_date_day_of_month <= 3.5 entropy = 0.469 samples = 10 value = [9, 1] 2773->2774 2777 lead_time <= 164.0 entropy = 0.996 samples = 69 value = [37, 32] 2773->2777 2775 entropy = 0.0 samples = 1 value = [0, 1] 2774->2775 2776 entropy = 0.0 samples = 9 value = [9, 0] 2774->2776 2778 arrival_date_day_of_month <= 29.5 entropy = 0.992 samples = 67 value = [37, 30] 2777->2778 2821 entropy = 0.0 samples = 2 value = [0, 2] 2777->2821 2779 arrival_date_week_number <= 49.5 entropy = 0.997 samples = 64 value = [34, 30] 2778->2779 2820 entropy = 0.0 samples = 3 value = [3, 0] 2778->2820 2780 reserved_room_type_E <= 0.5 entropy = 0.987 samples = 60 value = [34, 26] 2779->2780 2819 entropy = 0.0 samples = 4 value = [0, 4] 2779->2819 2781 assigned_room_type_E <= 0.5 entropy = 0.996 samples = 56 value = [30, 26] 2780->2781 2818 entropy = 0.0 samples = 4 value = [4, 0] 2780->2818 2782 lead_time <= 15.5 entropy = 0.991 samples = 54 value = [30, 24] 2781->2782 2817 entropy = 0.0 samples = 2 value = [0, 2] 2781->2817 2783 entropy = 0.0 samples = 3 value = [3, 0] 2782->2783 2784 lead_time <= 19.5 entropy = 0.998 samples = 51 value = [27, 24] 2782->2784 2785 entropy = 0.0 samples = 3 value = [0, 3] 2784->2785 2786 assigned_room_type_F <= 0.5 entropy = 0.989 samples = 48 value = [27, 21] 2784->2786 2787 stays_in_week_nights <= 2.5 entropy = 0.984 samples = 47 value = [27, 20] 2786->2787 2816 entropy = 0.0 samples = 1 value = [0, 1] 2786->2816 2788 adr <= 150.0 entropy = 0.989 samples = 16 value = [7, 9] 2787->2788 2797 arrival_date_week_number <= 36.5 entropy = 0.938 samples = 31 value = [20, 11] 2787->2797 2789 arrival_date_day_of_month <= 15.0 entropy = 0.722 samples = 10 value = [2, 8] 2788->2789 2794 arrival_date_week_number <= 29.5 entropy = 0.65 samples = 6 value = [5, 1] 2788->2794 2790 arrival_date_week_number <= 36.5 entropy = 1.0 samples = 4 value = [2, 2] 2789->2790 2793 entropy = 0.0 samples = 6 value = [0, 6] 2789->2793 2791 entropy = 0.0 samples = 2 value = [0, 2] 2790->2791 2792 entropy = 0.0 samples = 2 value = [2, 0] 2790->2792 2795 entropy = 1.0 samples = 2 value = [1, 1] 2794->2795 2796 entropy = 0.0 samples = 4 value = [4, 0] 2794->2796 2798 adr <= 236.0 entropy = 0.995 samples = 24 value = [13, 11] 2797->2798 2815 entropy = 0.0 samples = 7 value = [7, 0] 2797->2815 2799 adr <= 139.4 entropy = 0.976 samples = 22 value = [13, 9] 2798->2799 2814 entropy = 0.0 samples = 2 value = [0, 2] 2798->2814 2800 children <= 0.5 entropy = 0.971 samples = 10 value = [4, 6] 2799->2800 2807 adr <= 168.0 entropy = 0.811 samples = 12 value = [9, 3] 2799->2807 2801 adr <= 128.5 entropy = 0.811 samples = 8 value = [2, 6] 2800->2801 2806 entropy = 0.0 samples = 2 value = [2, 0] 2800->2806 2802 entropy = 0.0 samples = 4 value = [0, 4] 2801->2802 2803 arrival_date_day_of_month <= 24.5 entropy = 1.0 samples = 4 value = [2, 2] 2801->2803 2804 entropy = 0.0 samples = 2 value = [2, 0] 2803->2804 2805 entropy = 0.0 samples = 2 value = [0, 2] 2803->2805 2808 entropy = 0.0 samples = 5 value = [5, 0] 2807->2808 2809 arrival_date_week_number <= 32.5 entropy = 0.985 samples = 7 value = [4, 3] 2807->2809 2810 arrival_date_week_number <= 28.5 entropy = 0.811 samples = 4 value = [1, 3] 2809->2810 2813 entropy = 0.0 samples = 3 value = [3, 0] 2809->2813 2811 entropy = 0.0 samples = 1 value = [1, 0] 2810->2811 2812 entropy = 0.0 samples = 3 value = [0, 3] 2810->2812 2824 arrival_date_week_number <= 46.0 entropy = 0.961 samples = 13 value = [5, 8] 2823->2824 2833 entropy = 0.0 samples = 7 value = [0, 7] 2823->2833 2825 lead_time <= 60.5 entropy = 1.0 samples = 10 value = [5, 5] 2824->2825 2832 entropy = 0.0 samples = 3 value = [0, 3] 2824->2832 2826 entropy = 0.0 samples = 3 value = [3, 0] 2825->2826 2827 lead_time <= 141.5 entropy = 0.863 samples = 7 value = [2, 5] 2825->2827 2828 meal_BB <= 0.5 entropy = 0.65 samples = 6 value = [1, 5] 2827->2828 2831 entropy = 0.0 samples = 1 value = [1, 0] 2827->2831 2829 entropy = 1.0 samples = 2 value = [1, 1] 2828->2829 2830 entropy = 0.0 samples = 4 value = [0, 4] 2828->2830 2836 lead_time <= 175.5 entropy = 0.918 samples = 3 value = [1, 2] 2835->2836 2839 entropy = 0.0 samples = 23 value = [23, 0] 2835->2839 2837 entropy = 0.0 samples = 1 value = [1, 0] 2836->2837 2838 entropy = 0.0 samples = 2 value = [0, 2] 2836->2838 2841 entropy = 0.0 samples = 9 value = [0, 9] 2840->2841 2842 arrival_date_day_of_month <= 4.5 entropy = 0.982 samples = 19 value = [8, 11] 2840->2842 2843 entropy = 0.0 samples = 3 value = [3, 0] 2842->2843 2844 lead_time <= 290.5 entropy = 0.896 samples = 16 value = [5, 11] 2842->2844 2845 adr <= 104.5 entropy = 0.994 samples = 11 value = [5, 6] 2844->2845 2852 entropy = 0.0 samples = 5 value = [0, 5] 2844->2852 2846 lead_time <= 256.5 entropy = 0.65 samples = 6 value = [1, 5] 2845->2846 2849 reserved_room_type_C <= 0.5 entropy = 0.722 samples = 5 value = [4, 1] 2845->2849 2847 entropy = 0.0 samples = 1 value = [1, 0] 2846->2847 2848 entropy = 0.0 samples = 5 value = [0, 5] 2846->2848 2850 entropy = 0.0 samples = 4 value = [4, 0] 2849->2850 2851 entropy = 0.0 samples = 1 value = [0, 1] 2849->2851 2855 entropy = 0.0 samples = 29 value = [29, 0] 2854->2855 2856 arrival_date_month_September <= 0.5 entropy = 0.918 samples = 6 value = [2, 4] 2854->2856 2857 entropy = 0.0 samples = 4 value = [0, 4] 2856->2857 2858 entropy = 0.0 samples = 2 value = [2, 0] 2856->2858
In [160]:
dot_data = tree.export_graphviz(treeclf,out_file=None,feature_names=train1.columns,class_names=True)
graph = graphviz.Source(dot_data)
graph.render("image",view=True)
Out[160]:
'image.pdf'

3.3 SVM¶

In [91]:
from sklearn.svm import SVC
# initialize svm classifier with balanced class weights
clf4 = SVC(class_weight='balanced', tol=1e-2, cache_size= 600)                  
# grid search parameters
parameters = {'gamma': [0.1, .25, .4, .5, .7], 'C': [20, 40, 50, 75, 100], 'kernel': ['rbf', 'linear']}
# initialize grid search with 3-fold cv
gs4 = GridSearchCV(clf4, parameters, verbose=1, cv=3)
In [93]:
#Perform grid search on sample
gs4.fit(train, target_train)   
Fitting 3 folds for each of 50 candidates, totalling 150 fits
Out[93]:
GridSearchCV(cv=3,
             estimator=SVC(cache_size=600, class_weight='balanced', tol=0.01),
             param_grid={'C': [20, 40, 50, 75, 100],
                         'gamma': [0.1, 0.25, 0.4, 0.5, 0.7],
                         'kernel': ['rbf', 'linear']},
             verbose=1)
In [94]:
for (i, j) in gs4.best_params_.items():
    print ("The optimal value of", i, "is:", j)
print()
print("The best cross-validation accuracy on the sample data was: {}".format(gs4.best_score_))
The optimal value of C is: 100
The optimal value of gamma is: 0.4
The optimal value of kernel is: rbf

The best cross-validation accuracy on the sample data was: 0.7862638234936336
In [125]:
#Initializing svm classifier with optimal parameters
clf4 = SVC(kernel='rbf', gamma =.4, C=100, class_weight='balanced') 
#Fit on training data
clf4.fit(train, target_train)   
#Predict classes of test set
pred_test_target = clf4.predict(test)
#Printing the classification report
print(classification_report(target_test, pred_test_target))  # print classification report
              precision    recall  f1-score   support

           0       0.55      0.68      0.61       702
           1       0.89      0.82      0.85      2159

    accuracy                           0.79      2861
   macro avg       0.72      0.75      0.73      2861
weighted avg       0.81      0.79      0.79      2861

In [147]:
from sklearn.model_selection import cross_val_score

clf4 = SVC(kernel='rbf', gamma =.7, C=100, class_weight='balanced') 
cv_scores = cross_val_score(clf4, train, target_train, cv=5)

print(f"\nCross validation scores:\n{cv_scores}")
print("\n Overall Accuracy on X-Val: %0.2f (+/- %0.2f)" % (cv_scores.mean(), cv_scores.std() * 2))

clf4.fit(train, target_train) 
print("\n Accuracy on Training: ",  clf4.score(train, target_train))

clf4.fit(test, target_test)
print("\n Accuracy on Testing: ",  clf4.score(test, target_test))
Cross validation scores:
[0.77632 0.78681 0.7903  0.782   0.79065]

 Overall Accuracy on X-Val: 0.79 (+/- 0.01)

 Accuracy on Training:  0.9529010835372247

 Accuracy on Testing:  0.9849702901083537

3.4 Naive Bayes¶

In [96]:
# Building a Naive Bayes classifier:
    
    # Notes:
    # 1-GaussianNB assumes distributions are normal 
    # 2-normalization is not needed for this classifier

from sklearn import naive_bayes
from sklearn.model_selection import cross_val_score  # for cross-validation
from sklearn.metrics import classification_report
    
nbclf = naive_bayes.GaussianNB()  
nbclf = nbclf.fit(train, target_train)
print("Score on Training: ", nbclf.score(train, target_train))
print("Score on Test: ", nbclf.score(test, target_test))

# perform 10-fold cross-validation on the 80% training data
cv_scores = cross_val_score(nbclf, train, target_train, cv=10)
print(f"\nCross validation scores:\n{cv_scores}")

# report the overall average accuracy.
print("\nOverall Accuracy: %0.2f (+/- %0.2f)" % (cv_scores.mean(),cv_scores.std()*2))

# Compare the cv accuracy to the model accuracy on the train data as a whole. 
print("\nAccuracy on Training: ",  nbclf.score(train, target_train))

# run your model on the set-aside 20% test data
nbpreds_test = nbclf.predict(test)
print("\nAccuracy on 20% test set: ",  nbclf.score(test, target_test))
print("\n\nclassification report:\n")
print(classification_report(target_test, nbpreds_test))
Score on Training:  0.4556099265990912
Score on Test:  0.4627752534078993

Cross validation scores:
[0.45066 0.45328 0.44192 0.44891 0.46241 0.44843 0.46766 0.46154 0.45717
 0.46416]

Overall Accuracy: 0.46 (+/- 0.02)

Accuracy on Training:  0.4556099265990912

Accuracy on 20% test set:  0.4627752534078993


classification report:

              precision    recall  f1-score   support

           0       0.31      0.97      0.47       702
           1       0.97      0.30      0.45      2159

    accuracy                           0.46      2861
   macro avg       0.64      0.64      0.46      2861
weighted avg       0.81      0.46      0.46      2861

In [97]:
# Hyperparameter tuning for Naive Bayes:
from sklearn.model_selection import GridSearchCV

parameters = {'var_smoothing': (1e-9,1e-7, 1e-5, 1e-4,1e-2)}

gs = GridSearchCV(nbclf, parameters, scoring = 'accuracy', cv=10)
In [98]:
_ = gs.fit(train, target_train)

gs.best_params_, gs.best_score_
Out[98]:
({'var_smoothing': 0.01}, 0.5263936391119797)
In [99]:
# Creating the best model based on the above grid search results:

best_nbclf = naive_bayes.GaussianNB(var_smoothing= 0.01)  
best_nbclf = best_nbclf.fit(train, target_train)
print("Score on Training: ", best_nbclf.score(train, target_train))
print("Score on Test: ", best_nbclf.score(test, target_test))

# perform 10-fold cross-validation on the 80% training data
cv_scores = cross_val_score(best_nbclf, train, target_train, cv=10)
print(f"\nCross validation scores:\n{cv_scores}")

# report the overall average accuracy.
print("\nOverall Accuracy: %0.2f (+/- %0.2f)" % (cv_scores.mean(),cv_scores.std()*2))

# Compare the cv accuracy to the model accuracy on the train data as a whole. 
print("\nAccuracy on Training: ",  best_nbclf.score(train, target_train))

# run your model on the set-aside 20% test data
best_nbclf_test = best_nbclf.predict(test)
print("\nAccuracy on 20% test set: ",  best_nbclf.score(test, target_test))
print("\n\nclassification report:\n")
print(classification_report(target_test, best_nbclf_test))
Score on Training:  0.5297972736805313
Score on Test:  0.5456134218804614

Cross validation scores:
[0.50917 0.51703 0.52664 0.50393 0.54983 0.53584 0.53846 0.52448 0.52273
 0.53584]

Overall Accuracy: 0.53 (+/- 0.03)

Accuracy on Training:  0.5297972736805313

Accuracy on 20% test set:  0.5456134218804614


classification report:

              precision    recall  f1-score   support

           0       0.34      0.87      0.48       702
           1       0.91      0.44      0.59      2159

    accuracy                           0.55      2861
   macro avg       0.62      0.66      0.54      2861
weighted avg       0.77      0.55      0.57      2861

In [100]:
# Usin the grid search results (var_smoothing= 1e-07) slightly increased the accuracy on train and test
In [101]:
from sklearn.metrics import confusion_matrix
# generate the confusion matrix
nbcm = confusion_matrix(target_test, best_nbclf_test)
print(nbcm)
plt.matshow(nbcm)
plt.title('Confusion matrix')
plt.colorbar()
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show();
[[ 611   91]
 [1209  950]]

Multinomial Naive Bayes

We used Multinomial Naive Bayes which we assumed would work slightly better on our mostly discrete dataset. However, that classifier did not provide a higher accuracy either, so we skipped the parameter tuning step for it.

In [102]:
nbmclf = naive_bayes.MultinomialNB()   
nbmclf = nbclf.fit(train, target_train)

# 10-fold cross-validation on the training data
cv_scores = cross_val_score(nbmclf, train, target_train, cv=10)
print(f"\nCross validation scores:\n{cv_scores}")

# Overall average accuracy.
print("\nOverall Accuracy: %0.2f (+/- %0.2f)" % (cv_scores.mean(),cv_scores.std()*2))

# Compare the cv accuracy to the model accuracy on the train data as a whole. 
print("\nAccuracy on Training: ",  nbmclf.score(train, target_train))

# run your model on the set-aside test data
nbmclf_test = nbmclf.predict(test)
print("\nAccuracy on test set: ",  nbmclf.score(test, target_test))
print("\n\nclassification report:\n")
print(classification_report(target_test, nbmclf_test))
Cross validation scores:
[0.45066 0.45328 0.44192 0.44891 0.46241 0.44843 0.46766 0.46154 0.45717
 0.46416]

Overall Accuracy: 0.46 (+/- 0.02)

Accuracy on Training:  0.4556099265990912

Accuracy on test set:  0.4627752534078993


classification report:

              precision    recall  f1-score   support

           0       0.31      0.97      0.47       702
           1       0.97      0.30      0.45      2159

    accuracy                           0.46      2861
   macro avg       0.64      0.64      0.46      2861
weighted avg       0.81      0.46      0.46      2861

3.5 linear discriminant analysis¶

In [103]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis   # it is a classifier (supervised)

ldclf = LinearDiscriminantAnalysis()
ldclf = ldclf.fit(train, target_train)
ldpreds_test = ldclf.predict(test)

# 10-fold cross-validation on the training data
cv_scores = cross_val_score(ldclf, train, target_train, cv=10)
print(f"\nCross validation scores:\n{cv_scores}")

# Overall average accuracy.
print("\nOverall Accuracy: %0.2f (+/- %0.2f)" % (cv_scores.mean(),cv_scores.std()*2))

# Compare the cv accuracy to the model accuracy on the train data as a whole. 
print("\nAccuracy on Training: ",  ldclf.score(train, target_train))

# run your model on the set-aside test data
ldclf_test = ldclf.predict(test)
print("\nAccuracy on test set: ",  ldclf.score(test, target_test))
print("\n\nclassification report:\n")
print(classification_report(target_test, ldclf_test))
Cross validation scores:
[0.78865 0.7869  0.78865 0.79476 0.77797 0.76486 0.78234 0.78234 0.79196
 0.7771 ]

Overall Accuracy: 0.78 (+/- 0.02)

Accuracy on Training:  0.7866130723523244

Accuracy on test set:  0.7790982174065012


classification report:

              precision    recall  f1-score   support

           0       0.59      0.32      0.41       702
           1       0.81      0.93      0.86      2159

    accuracy                           0.78      2861
   macro avg       0.70      0.62      0.64      2861
weighted avg       0.75      0.78      0.75      2861

In [104]:
# hyperparameter tuning for LDA:
# The most important parameter to tune in LDA is the solver:

parameters = {'solver': ['svd', 'lsqr']}  
gs = GridSearchCV(ldclf, parameters, scoring='accuracy', cv=10)
gs_res = gs.fit(train, target_train)

gs_res.best_params_, gs_res.best_score_
Out[104]:
({'solver': 'svd'}, 0.7835525391638928)
In [105]:
# Checkig if auto Shrinkage combined with the two other solvers provides a higher accuracy:
# Side note: svd only works with shrinkage=None

ldclf_2 = LinearDiscriminantAnalysis(shrinkage='auto')
parameters_2 = {'solver': ('lsqr','eigen')} 

gs = GridSearchCV(ldclf_2, parameters_2, scoring='accuracy', cv=10)
gs_res = gs.fit(train, target_train)

gs_res.best_params_, gs_res.best_score_
Out[105]:
({'solver': 'lsqr'}, 0.778136317830641)
In [106]:
# Creating the best model based on the above grid search results: svd results in a higher accuracy
# We get the same results as the model without parameter tuning because svd is the default solver
best_ldclf = LinearDiscriminantAnalysis(solver='svd')
best_ldclf = best_ldclf.fit(train, target_train)
ldpreds_test = best_ldclf.predict(test)

# 10-fold cross-validation on the training data
cv_scores = cross_val_score(best_ldclf, train, target_train, cv=10)
print(f"\nCross validation scores:\n{cv_scores}")

# Overall average accuracy.
print("\nOverall Accuracy: %0.2f (+/- %0.2f)" % (cv_scores.mean(),cv_scores.std()*2))

# Compare the cv accuracy to the model accuracy on the train data as a whole. 
print("\nAccuracy on Training: ",  best_ldclf.score(train, target_train))

# run your model on the set-aside test data
best_ldclf_test = best_ldclf.predict(test)
print("\nAccuracy on test set: ",  best_ldclf.score(test, target_test))
print("\n\nclassification report:\n")
print(classification_report(target_test, best_ldclf_test))
Cross validation scores:
[0.78865 0.7869  0.78865 0.79476 0.77797 0.76486 0.78234 0.78234 0.79196
 0.7771 ]

Overall Accuracy: 0.78 (+/- 0.02)

Accuracy on Training:  0.7866130723523244

Accuracy on test set:  0.7790982174065012


classification report:

              precision    recall  f1-score   support

           0       0.59      0.32      0.41       702
           1       0.81      0.93      0.86      2159

    accuracy                           0.78      2861
   macro avg       0.70      0.62      0.64      2861
weighted avg       0.75      0.78      0.75      2861

In [107]:
# Above is the same accuracy as the model with no parameter tuning because svd is the default solver.
In [108]:
# generate the confusion matrix
ldcm = confusion_matrix(target_test, best_ldclf_test)
print(ldcm)
[[ 222  480]
 [ 152 2007]]

3.6 Random Forest¶

In [109]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report, accuracy_score, recall_score, precision_score, confusion_matrix, roc_curve, roc_auc_score,plot_confusion_matrix
import warnings
warnings.filterwarnings('ignore')
In [110]:
def grid_search(X_tr, X_te, y_tr, y_te, model, params, scoring='recall'):
    gs = GridSearchCV(estimator = model, param_grid = params, scoring = scoring, n_jobs=-1, cv=3)
    gs.fit(X_tr, y_tr)
    y_pred = gs.predict(X_te)
    print(f"{model}")
    print(f"Best parameter      : {gs.best_params_}")
    print(f"Test Accuracy Score : {accuracy_score(y_te, y_pred)}")
    print(f"Train Accuracy Score: {accuracy_score(y_tr, gs.predict(X_tr))}")
    print(f"Recall score        : {recall_score(y_te, y_pred)}")
    print(f"Classification Report \n{'-'*30}\n {classification_report(y_te, y_pred)}")
    return gs.best_params_
In [111]:
params = {
    'n_estimators':[100, 200, 300, 400, 500, 600],
    'criterion' : ['gini', 'entropy'],
    'max_depth' : [5, 10, 15, 20],
}
model = RandomForestClassifier(random_state=33)
rf_best = grid_search(train, test, target_train, target_test, model, params, scoring='accuracy')
RandomForestClassifier(random_state=33)
Best parameter      : {'criterion': 'entropy', 'max_depth': 20, 'n_estimators': 600}
Test Accuracy Score : 0.8413142257951766
Train Accuracy Score: 0.9580566235581964
Recall score        : 0.9467345993515517
Classification Report 
------------------------------
               precision    recall  f1-score   support

           0       0.76      0.52      0.62       702
           1       0.86      0.95      0.90      2159

    accuracy                           0.84      2861
   macro avg       0.81      0.73      0.76      2861
weighted avg       0.83      0.84      0.83      2861

In [141]:
rf = RandomForestClassifier(n_estimators=rf_best['n_estimators'], criterion=rf_best['criterion'], max_depth=rf_best['max_depth'], 
                                  random_state=33)
In [144]:
from sklearn.model_selection import cross_val_score

cv_scores = cross_val_score(rf, train, target_train, cv=10)
print(f"\nCross validation scores:\n{cv_scores}")
print("\n Overall Accuracy on X-Val: %0.2f (+/- %0.2f)" % (cv_scores.mean(), cv_scores.std() * 2))

rf = rf.fit(train, target_train)
print("\n Accuracy on Training: ",  rf.score(train, target_train))

rf = rf.fit(test, target_test)
print("\n Accuracy on Testing: ",  rf.score(test, target_test))
Cross validation scores:
[0.84192 0.8559  0.83668 0.85066 0.84441 0.86276 0.83916 0.84878 0.8479
 0.85839]

 Overall Accuracy on X-Val: 0.85 (+/- 0.02)

 Accuracy on Training:  0.9580566235581964

 Accuracy on Testing:  0.9954561342188046
In [113]:
plot_confusion_matrix(rf, test, target_test)
plt.rcParams["figure.figsize"] = (12,12)
plt.show()

4 - Unsupervised Knowledge Discovery¶

4.1 Clustering¶

In [114]:
# Min-Max normalization:

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_norm = scaler.fit_transform(X_ssf)    
In [115]:
# plot_silhouettes function from class example:
# https://nbviewer.org/url/bmobasher.com/Class/CSC478/Clustering.ipynb

def plot_silhouettes(data, clusters, metric='euclidean'):
    
    from matplotlib import cm
    from sklearn.metrics import silhouette_samples

    cluster_labels = np.unique(clusters)
    n_clusters = cluster_labels.shape[0]
    silhouette_vals = metrics.silhouette_samples(data, clusters, metric='euclidean')
    c_ax_lower, c_ax_upper = 0, 0
    cticks = []
    for i, k in enumerate(cluster_labels):
        c_silhouette_vals = silhouette_vals[clusters == k]
        c_silhouette_vals.sort()
        c_ax_upper += len(c_silhouette_vals)
        color = cm.jet(float(i) / n_clusters)
        pl.barh(range(c_ax_lower, c_ax_upper), c_silhouette_vals, height=1.0, 
                      edgecolor='none', color=color)

        cticks.append((c_ax_lower + c_ax_upper) / 2)
        c_ax_lower += len(c_silhouette_vals)
    
    silhouette_avg = np.mean(silhouette_vals)
    pl.axvline(silhouette_avg, color="red", linestyle="--") 
    pl.yticks(cticks, cluster_labels)
    pl.ylabel('Cluster')
    pl.xlabel('Silhouette coefficient')
    pl.tight_layout()
    pl.show();
    
    return
In [116]:
from sklearn.cluster import KMeans 
from sklearn import metrics
from sklearn.metrics import completeness_score, homogeneity_score
from sklearn.metrics import silhouette_samples

def cluster_analysis(data, column_names, target, plot_silhouettes, k=7):
    "A function that does clustering analysis on data"
    
    DF = pd.DataFrame(data, columns=column_names)
    kms = KMeans(n_clusters=k, max_iter=500, verbose=0, n_init=5, random_state=33) 
    kms.fit(DF)
    clusters = kms.predict(data)   
    clusters = clusters.astype(int)
    
    cluster_num, size = np.unique(clusters, return_counts=True)
    print("\n"+"-"*10+"Cluster Analysis"+"-"*10+"\n")
    print(f"Number of clusters: {k}")
    
    centroids = pd.DataFrame(kms.cluster_centers_, columns=column_names)
    silhouettes = metrics.silhouette_samples(data, clusters)
    
    for i in range(len(cluster_num)):
        print("Size of Cluster", cluster_num[i], "= ", size[i])
    
    print("\nCluster centroids:")
#     pd.set_option('max_columns', None)
    display(centroids)
    print("\nSilhouette plot:")
    plot_silhouettes(data, clusters)
    
    print("\n\nMean Silhouette Value: ", silhouettes.mean())
    complete = completeness_score(target,clusters)
    print("Completeness score: ", complete)
    homogen = homogeneity_score(target,clusters)
    print("Homogeneity score: ",homogen,"\n")
    
    return silhouettes, complete, homogen, clusters, centroids
In [117]:
silhouettes_mean, completeness , homogeneity  =[],[],[]
k_range= list(range(2,20))  # trying k=2 all the way to k=20
for K in k_range:   
    sil,comp,hmg,_,_=cluster_analysis(X_norm,X_ssf.columns,y,plot_silhouettes,k=K)
    silhouettes_mean.append(sil.mean())
    completeness.append(comp)
    homogeneity.append(hmg)
----------Cluster Analysis----------

Number of clusters: 2
Size of Cluster 0 =  6104
Size of Cluster 1 =  8201

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.17278 0.48648 0.52169 0.49641 0.04530 0.05394 0.06369 0.01092 0.00377 0.06193 ... 0.71494 0.28244 0.00262 0.14007 0.55013 0.30980 0.03539 0.00508 0.56062 0.39892
1 0.11522 0.64181 0.48953 0.49122 0.06727 0.06863 0.07260 0.05361 0.00451 0.01097 ... 0.99902 0.00085 0.00012 0.00012 0.98549 0.01439 0.03487 0.00573 0.90416 0.05524

2 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.120709553994198
Completeness score:  0.020741610562666327
Homogeneity score:  0.02583576722378519 


----------Cluster Analysis----------

Number of clusters: 3
Size of Cluster 0 =  2561
Size of Cluster 1 =  7629
Size of Cluster 2 =  4115

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.07422 0.57536 0.49854 0.50138 0.04617 0.05489 0.06245 0.03918 0.00957 0.12612 ... 0.96017 0.03592 0.00390 -6.93889e-18 0.34752 0.65248 0.00195 0.00976 0.79032 0.19797
1 0.11597 0.63618 0.49156 0.48884 0.06681 0.06832 0.07258 0.05158 0.00413 0.00983 ... 0.99974 0.00013 0.00013 2.62261e-04 0.99607 0.00367 0.03383 0.00551 0.90454 0.05612
2 0.22464 0.46333 0.52783 0.49701 0.04867 0.05599 0.06575 0.00308 0.00097 0.01700 ... 0.60078 0.39777 0.00146 2.07382e-01 0.71734 0.07528 0.05804 0.00267 0.46503 0.47426

3 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.1334278627385949
Completeness score:  0.016093857919452847
Homogeneity score:  0.029426563177585862 


----------Cluster Analysis----------

Number of clusters: 4
Size of Cluster 0 =  3812
Size of Cluster 1 =  4263
Size of Cluster 2 =  2459
Size of Cluster 3 =  3771

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.23049 0.46392 0.52652 0.49815 0.04579 0.05357 0.06565 0.00271 0.00092 0.01679 ... 0.57203 4.26397e-01 1.57439e-03 2.21202e-01 0.69877 0.08003 0.02781 0.00105 0.47442 0.49672
1 0.11818 0.64024 0.49343 0.48679 0.06319 0.06500 0.06894 0.01814 0.00328 0.00891 ... 0.99977 -7.21645e-16 2.34522e-04 5.27356e-16 0.99578 0.00422 0.05535 0.00610 0.89329 0.04526
2 0.07233 0.57035 0.49914 0.50194 0.04532 0.05388 0.06204 0.03985 0.00956 0.12973 ... 0.96218 3.37536e-02 4.06669e-03 -5.55112e-17 0.32574 0.67426 0.00203 0.00976 0.78406 0.20415
3 0.11651 0.61854 0.49352 0.49065 0.07234 0.07380 0.07622 0.08503 0.00504 0.01246 ... 0.99390 6.09918e-03 2.16840e-18 3.44736e-03 0.98913 0.00743 0.04110 0.00636 0.87298 0.07955

4 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.11352242091907526
Completeness score:  0.011107883199360064
Homogeneity score:  0.02772500571421824 


----------Cluster Analysis----------

Number of clusters: 5
Size of Cluster 0 =  2485
Size of Cluster 1 =  4111
Size of Cluster 2 =  1637
Size of Cluster 3 =  3681
Size of Cluster 4 =  2391

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.19024 0.42696 0.54180 0.50168 0.05609 0.06003 0.06473 0.00550 1.20724e-03 0.02374 ... 0.97787 2.01207e-02 2.01207e-03 1.52918e-01 0.78994 0.05714 0.05875 4.42656e-03 0.14286 0.79396
1 0.11382 0.64461 0.49250 0.48742 0.06060 0.06256 0.06894 0.01824 3.40550e-03 0.00851 ... 0.99976 -7.91034e-16 2.43250e-04 2.43250e-04 0.99514 0.00462 0.03503 5.10825e-03 0.92654 0.03333
2 0.28447 0.53696 0.49554 0.48962 0.03955 0.05264 0.06690 0.00020 1.38778e-17 0.00611 ... 0.00489 9.95113e-01 -2.81893e-18 2.88943e-01 0.57605 0.13500 0.03787 1.04083e-17 0.92242 0.03971
3 0.11518 0.61994 0.49338 0.49083 0.07238 0.07366 0.07640 0.08666 5.02581e-03 0.01222 ... 0.99674 3.25998e-03 1.30104e-18 5.43331e-04 0.99321 0.00625 0.03939 6.24830e-03 0.88346 0.07090
4 0.07079 0.56922 0.50216 0.50185 0.04538 0.05371 0.06238 0.04113 1.02468e-02 0.13342 ... 0.97867 1.67294e-02 4.60059e-03 -1.11022e-16 0.32915 0.67085 0.00209 9.61941e-03 0.79925 0.18904

5 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.1304795384489958
Completeness score:  0.02564862841455512
Homogeneity score:  0.07298119283366054 


----------Cluster Analysis----------

Number of clusters: 6
Size of Cluster 0 =  2023
Size of Cluster 1 =  2340
Size of Cluster 2 =  1762
Size of Cluster 3 =  2577
Size of Cluster 4 =  3256
Size of Cluster 5 =  2347

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.20102 0.41024 0.54616 0.50396 0.04680 0.05261 0.06424 0.00445 7.41840e-04 0.01731 ... 0.97329 2.42334e-02 2.47280e-03 1.87933e-01 0.74184 0.07023 0.01879 1.48368e-03 0.01632 0.96340
1 0.12078 0.56728 0.50499 0.49244 0.08260 0.08334 0.07086 0.06578 8.11619e-03 0.02179 ... 1.00000 -5.13478e-16 -3.03577e-18 -1.45717e-16 0.99231 0.00769 0.05382 5.55318e-03 0.86758 0.07305
2 0.27101 0.53462 0.49629 0.49274 0.04054 0.05241 0.06691 0.00114 1.56125e-17 0.00908 ... 0.07605 9.23950e-01 -2.60209e-18 2.68445e-01 0.60670 0.12486 0.03519 6.93889e-18 0.92679 0.03802
3 0.11784 0.64435 0.49196 0.49164 0.07007 0.07201 0.07793 0.06881 2.91036e-03 0.00893 ... 0.99573 4.26853e-03 -3.03577e-18 7.76096e-04 0.99302 0.00621 0.04734 8.14901e-03 0.87776 0.06674
4 0.11080 0.65617 0.48892 0.48393 0.05595 0.05695 0.06860 0.02058 2.30344e-03 0.00768 ... 0.99969 -6.93889e-16 3.07125e-04 3.07125e-04 0.99539 0.00430 0.04607 5.22113e-03 0.92045 0.02826
5 0.07175 0.56945 0.50206 0.50107 0.04516 0.05424 0.06233 0.04133 1.04389e-02 0.13549 ... 0.97699 1.83213e-02 4.68683e-03 -1.31839e-16 0.31870 0.68130 0.00170 1.02258e-02 0.80145 0.18662

6 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.1308407242428604
Completeness score:  0.0165445301127643
Homogeneity score:  0.05354370601449618 


----------Cluster Analysis----------

Number of clusters: 7
Size of Cluster 0 =  1558
Size of Cluster 1 =  2077
Size of Cluster 2 =  3285
Size of Cluster 3 =  1559
Size of Cluster 4 =  1618
Size of Cluster 5 =  1940
Size of Cluster 6 =  2268

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.13706 0.60013 0.50337 0.49484 0.09166 0.09270 0.07434 0.09713 6.73941e-03 0.02054 ... 0.99101 8.98588e-03 -2.60209e-18 6.41849e-04 0.98588 0.01348 0.05777 4.49294e-03 0.81900 0.11874
1 0.19650 0.40371 0.54800 0.50769 0.04649 0.05176 0.06403 0.00449 7.22195e-04 0.01926 ... 0.97352 2.40732e-02 2.40732e-03 1.82956e-01 0.75060 0.06644 0.01637 9.62927e-04 0.06259 0.92008
2 0.11114 0.65967 0.48762 0.48372 0.05586 0.05689 0.06861 0.02029 2.28311e-03 0.00791 ... 0.99970 -7.07767e-16 3.04414e-04 3.04414e-04 0.99543 0.00426 0.04505 5.17504e-03 0.92146 0.02831
3 0.11852 0.54554 0.51143 0.49087 0.08110 0.08172 0.06897 0.01155 8.01796e-03 0.02502 ... 0.98974 1.02630e-02 -2.60209e-18 -2.35922e-16 0.99423 0.00577 0.07056 8.98012e-03 0.85119 0.06928
4 0.28518 0.54141 0.49336 0.49011 0.03905 0.05241 0.06684 0.00021 1.38778e-17 0.00618 ... 0.00494 9.95056e-01 -2.81893e-18 2.92336e-01 0.57108 0.13659 0.03832 1.21431e-17 0.92151 0.04017
5 0.10839 0.66134 0.48532 0.49125 0.06047 0.06294 0.07942 0.08677 2.83505e-03 0.00361 ... 0.99897 1.03093e-03 -2.81893e-18 5.15464e-04 0.99536 0.00412 0.02784 8.24742e-03 0.91186 0.05206
6 0.06893 0.56570 0.50160 0.49950 0.04338 0.05247 0.06183 0.04071 9.92063e-03 0.13845 ... 0.97795 1.71958e-02 4.85009e-03 -1.73472e-16 0.29541 0.70459 0.00176 9.70018e-03 0.80115 0.18739

7 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.12747354496301677
Completeness score:  0.020355623644769293
Homogeneity score:  0.07099105550652084 


----------Cluster Analysis----------

Number of clusters: 8
Size of Cluster 0 =  2334
Size of Cluster 1 =  2290
Size of Cluster 2 =  1949
Size of Cluster 3 =  2389
Size of Cluster 4 =  1252
Size of Cluster 5 =  881
Size of Cluster 6 =  1626
Size of Cluster 7 =  1584

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.12062 0.56727 0.50575 0.49253 0.08266 0.08335 0.07084 0.06584 8.14053e-03 0.02228 ... 1.00000 -4.99600e-16 -3.46945e-18 -1.38778e-16 0.99186 0.00814 0.05398 5.56984e-03 0.86718 0.07326
1 0.12275 0.65437 0.48994 0.49715 0.07233 0.07388 0.07844 0.06827 2.40175e-03 0.00961 ... 0.99520 4.80349e-03 -3.46945e-18 4.36681e-04 0.99345 0.00611 0.04498 9.17031e-03 0.87424 0.07162
2 0.20587 0.40841 0.54739 0.50465 0.04656 0.05263 0.06448 0.00410 5.13084e-04 0.01642 ... 0.97229 2.56542e-02 2.05233e-03 1.94459e-01 0.73730 0.06824 0.01591 5.13084e-04 0.01437 0.96921
3 0.11045 0.59439 0.50281 0.48681 0.05907 0.05914 0.06873 0.03209 3.13939e-03 0.01005 ... 1.00000 -5.27356e-16 -3.03577e-18 1.25576e-03 0.99372 0.00502 0.06279 4.60444e-03 0.90038 0.03223
4 0.10460 0.72684 0.47029 0.47417 0.04862 0.05196 0.06925 0.00879 1.99681e-03 0.00958 ... 0.99920 4.16334e-17 7.98722e-04 -1.94289e-16 0.99521 0.00479 0.02077 7.18850e-03 0.94089 0.03115
5 0.07047 0.55221 0.50358 0.48698 0.03228 0.04546 0.04889 0.00303 1.13507e-03 0.24972 ... 0.93303 5.33485e-02 1.36209e-02 -1.31839e-16 0.06697 0.93303 0.00114 6.81044e-03 0.62316 0.36890
6 0.28733 0.52921 0.50167 0.49293 0.03963 0.05244 0.06724 0.00062 1.30104e-17 0.00615 ... 0.00492 9.95080e-01 -2.60209e-18 2.90898e-01 0.58057 0.12854 0.03813 1.12757e-17 0.92128 0.04059
7 0.07045 0.59186 0.49267 0.50495 0.05303 0.05945 0.06958 0.06124 1.48359e-02 0.06061 ... 0.99684 3.15657e-03 -2.81893e-18 -2.35922e-16 0.49874 0.50126 0.00189 1.07323e-02 0.88826 0.09912

8 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.11640526710076259
Completeness score:  0.0200291318648752
Homogeneity score:  0.0744125893821951 


----------Cluster Analysis----------

Number of clusters: 9
Size of Cluster 0 =  1565
Size of Cluster 1 =  2493
Size of Cluster 2 =  2176
Size of Cluster 3 =  864
Size of Cluster 4 =  1553
Size of Cluster 5 =  902
Size of Cluster 6 =  1271
Size of Cluster 7 =  2230
Size of Cluster 8 =  1251

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.07075 0.59265 0.49258 0.50560 0.05288 0.05963 0.06965 0.06113 1.50160e-02 0.06070 ... 9.96805e-01 3.19489e-03 -2.81893e-18 -2.49800e-16 0.50607 0.49393 0.00192 1.08626e-02 0.89457 0.09265
1 0.11100 0.59467 0.50269 0.48993 0.05844 0.05886 0.06895 0.04386 3.00842e-03 0.00963 ... 1.00000e+00 -5.41234e-16 -3.25261e-18 -3.46945e-17 0.99479 0.00521 0.06097 5.21460e-03 0.89130 0.04252
2 0.12375 0.65510 0.49046 0.49429 0.07347 0.07459 0.07858 0.05729 2.29779e-03 0.01011 ... 9.94945e-01 5.05515e-03 -3.25261e-18 4.59559e-04 0.99357 0.00597 0.04274 8.27206e-03 0.87500 0.07399
3 0.20630 0.37500 0.53750 0.51362 0.05093 0.04968 0.06511 0.00270 -7.80626e-18 0.02778 ... 9.17824e-01 7.52315e-02 6.94444e-03 5.25463e-01 0.33102 0.14352 0.02431 1.15741e-03 0.01389 0.96065
4 0.28389 0.55344 0.49646 0.49257 0.03996 0.05296 0.06704 0.00064 1.21431e-17 0.00580 ... 4.44089e-16 1.00000e+00 -2.60209e-18 2.58210e-01 0.60786 0.13393 0.03992 1.47451e-17 0.95943 0.00064
5 0.07099 0.54989 0.50495 0.48843 0.03305 0.04579 0.04936 0.00443 1.10865e-03 0.24390 ... 9.36807e-01 5.21064e-02 1.10865e-02 -1.38778e-16 0.06541 0.93459 0.00111 5.54324e-03 0.61530 0.37805
6 0.21178 0.42408 0.55504 0.49381 0.05094 0.06125 0.06481 0.00446 1.57356e-03 0.01259 ... 9.60661e-01 3.93391e-02 -2.38524e-18 -2.01228e-16 0.98899 0.01101 0.05193 3.93391e-03 0.06845 0.87569
7 0.11640 0.56906 0.50370 0.49329 0.07999 0.08093 0.07102 0.06846 8.29596e-03 0.02063 ... 1.00000e+00 -4.85723e-16 -3.03577e-18 -1.94289e-16 0.99327 0.00673 0.03543 4.48430e-03 0.89193 0.06816
8 0.10468 0.72742 0.47012 0.47437 0.04866 0.05192 0.06925 0.00879 1.99840e-03 0.00959 ... 9.99201e-01 4.16334e-17 7.99361e-04 -2.08167e-16 0.99520 0.00480 0.01998 7.19424e-03 0.94165 0.03118

9 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.1223081638367673
Completeness score:  0.019666632515886146
Homogeneity score:  0.07667402773166006 


----------Cluster Analysis----------

Number of clusters: 10
Size of Cluster 0 =  1115
Size of Cluster 1 =  2280
Size of Cluster 2 =  2255
Size of Cluster 3 =  779
Size of Cluster 4 =  1252
Size of Cluster 5 =  1105
Size of Cluster 6 =  1600
Size of Cluster 7 =  1041
Size of Cluster 8 =  659
Size of Cluster 9 =  2219

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.21542 0.59013 0.46357 0.48798 0.04159 0.05478 0.06620 2.39163e-03 4.48430e-04 0.01525 ... 0.13004 8.69955e-01 -2.16840e-18 -1.87350e-16 9.79372e-01 0.02063 3.46945e-17 1.12757e-17 0.99821 0.00179
1 0.12191 0.65241 0.49141 0.49861 0.07229 0.07374 0.07848 6.90058e-02 2.41228e-03 0.00965 ... 0.99605 3.94737e-03 -3.25261e-18 4.38596e-04 9.95614e-01 0.00395 4.56140e-02 9.21053e-03 0.87807 0.06711
2 0.11671 0.56918 0.50368 0.49305 0.08154 0.08257 0.07089 6.79970e-02 8.42572e-03 0.02129 ... 1.00000 -4.71845e-16 -3.25261e-18 -1.94289e-16 9.92905e-01 0.00710 3.72506e-02 3.99113e-03 0.89401 0.06475
3 0.05789 0.53915 0.51385 0.47445 0.03193 0.04323 0.04788 3.42319e-03 6.41849e-04 0.26316 ... 0.94095 5.90501e-02 -1.30104e-18 -9.02056e-17 8.08729e-02 0.91913 1.28370e-03 6.41849e-03 0.68164 0.31065
4 0.10466 0.72764 0.47019 0.47463 0.04862 0.05194 0.06925 8.78594e-03 1.99681e-03 0.00958 ... 0.99920 4.16334e-17 7.98722e-04 -1.94289e-16 9.95208e-01 0.00479 1.99681e-02 7.18850e-03 0.94169 0.03115
5 0.20058 0.46516 0.50670 0.51288 0.05339 0.05561 0.06482 3.01659e-03 4.52489e-04 0.02443 ... 0.98281 2.71493e-03 1.44796e-02 3.45701e-01 4.62443e-01 0.19186 1.99095e-02 1.80995e-03 0.00271 0.97557
6 0.06929 0.59187 0.49218 0.50646 0.05211 0.05886 0.06931 6.04167e-02 1.46875e-02 0.06812 ... 0.99687 3.12500e-03 -2.81893e-18 -2.35922e-16 4.90000e-01 0.51000 1.87500e-03 1.06250e-02 0.89250 0.09500
7 0.21042 0.38521 0.57871 0.49657 0.04569 0.05632 0.06429 4.80307e-03 1.44092e-03 0.01153 ... 0.95485 4.51489e-02 -1.95156e-18 -1.73472e-16 9.87512e-01 0.01249 5.76369e-02 5.76369e-03 0.05476 0.88184
8 0.36742 0.42413 0.56586 0.50577 0.03907 0.04818 0.06907 4.16334e-17 -6.93889e-18 0.00152 ... 0.01214 9.87860e-01 -6.50521e-19 7.17754e-01 1.11022e-16 0.28225 9.40819e-02 -6.07153e-18 0.80728 0.09863
9 0.11195 0.60005 0.50214 0.48433 0.05935 0.05950 0.06877 3.34986e-02 2.92925e-03 0.00676 ... 1.00000 -4.85723e-16 -3.25261e-18 -1.87350e-16 9.94592e-01 0.00541 6.35421e-02 4.05588e-03 0.89049 0.04191

10 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.11595621133547078
Completeness score:  0.014462472791619366
Homogeneity score:  0.05857855895637752 


----------Cluster Analysis----------

Number of clusters: 11
Size of Cluster 0 =  957
Size of Cluster 1 =  1477
Size of Cluster 2 =  1545
Size of Cluster 3 =  1209
Size of Cluster 4 =  771
Size of Cluster 5 =  1602
Size of Cluster 6 =  1534
Size of Cluster 7 =  1265
Size of Cluster 8 =  1874
Size of Cluster 9 =  1126
Size of Cluster 10 =  945

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.21245 0.37931 0.58243 0.50087 0.04023 0.05248 0.06301 0.00453 1.56740e-03 0.00731 ... 0.94984 5.01567e-02 -1.95156e-18 -1.59595e-16 0.98746 0.01254 0.01567 5.20417e-18 0.04493 9.39394e-01
1 0.11467 0.63913 0.49495 0.48905 0.05962 0.05978 0.06873 0.09749 2.70819e-03 0.00677 ... 1.00000 -1.38778e-16 -2.60209e-18 -2.15106e-16 0.99255 0.00745 0.03656 3.38524e-03 0.90995 5.01016e-02
2 0.11893 0.54757 0.51138 0.49167 0.08078 0.08142 0.06909 0.01165 7.76699e-03 0.02654 ... 0.98964 1.03560e-02 -2.81893e-18 -2.42861e-16 0.99353 0.00647 0.06926 9.70874e-03 0.85307 6.79612e-02
3 0.10362 0.72705 0.46868 0.47502 0.04818 0.05155 0.06914 0.00662 1.65426e-03 0.00993 ... 0.99917 6.93889e-17 8.27130e-04 -1.87350e-16 0.99504 0.00496 0.02068 7.44417e-03 0.93962 3.22581e-02
4 0.05817 0.54021 0.51319 0.47289 0.03186 0.04316 0.04770 0.00346 6.48508e-04 0.26459 ... 0.94034 5.96628e-02 -1.30104e-18 -9.71445e-17 0.07912 0.92088 0.00130 6.48508e-03 0.67834 3.13878e-01
5 0.06933 0.59114 0.49310 0.50643 0.05220 0.05900 0.06933 0.06055 1.46692e-02 0.06804 ... 0.99688 3.12110e-03 -2.60209e-18 -2.35922e-16 0.49064 0.50936 0.00187 1.06117e-02 0.89014 9.73783e-02
6 0.28454 0.55834 0.49396 0.49259 0.03940 0.05271 0.06698 0.00022 1.21431e-17 0.00587 ... 0.00065 9.99348e-01 -2.60209e-18 2.61408e-01 0.60235 0.13625 0.04042 1.47451e-17 0.95958 -3.88578e-16
7 0.11048 0.55534 0.50801 0.49636 0.05761 0.05848 0.06942 0.05112 3.55731e-03 0.01265 ... 0.99921 7.90514e-04 -2.38524e-18 -2.08167e-16 0.99130 0.00870 0.08538 7.11462e-03 0.86482 4.26877e-02
8 0.12442 0.64381 0.49376 0.48929 0.07557 0.07616 0.07907 0.01281 2.40128e-03 0.01067 ... 0.99413 5.86980e-03 -3.03577e-18 5.33618e-04 0.99360 0.00587 0.04589 9.07150e-03 0.86820 7.68410e-02
9 0.20900 0.42496 0.51500 0.51368 0.05112 0.05373 0.06509 0.00178 4.44050e-04 0.02398 ... 0.92540 6.03908e-02 1.42096e-02 4.03197e-01 0.41918 0.17762 0.01865 1.12757e-17 0.00888 9.72469e-01
10 0.12400 0.62804 0.48964 0.49552 0.08241 0.08426 0.07631 0.14956 7.40741e-03 0.01376 ... 0.99683 3.17460e-03 -1.95156e-18 -1.45717e-16 0.98624 0.01376 0.02116 1.05820e-03 0.89312 8.46561e-02

11 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.12724275770428534
Completeness score:  0.017413474493036908
Homogeneity score:  0.0752377904448279 


----------Cluster Analysis----------

Number of clusters: 12
Size of Cluster 0 =  1328
Size of Cluster 1 =  2509
Size of Cluster 2 =  899
Size of Cluster 3 =  1232
Size of Cluster 4 =  751
Size of Cluster 5 =  931
Size of Cluster 6 =  1252
Size of Cluster 7 =  1838
Size of Cluster 8 =  1042
Size of Cluster 9 =  1021
Size of Cluster 10 =  711
Size of Cluster 11 =  791

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.13225 0.60241 0.51257 0.49885 0.08763 0.08795 0.07254 6.87751e-02 7.15361e-03 0.00828 ... 0.99096 9.03614e-03 -2.60209e-18 -2.15106e-16 9.92470e-01 7.53012e-03 8.35843e-02 3.76506e-03 8.13253e-01 0.09940
1 0.11243 0.61558 0.49753 0.49130 0.05856 0.05979 0.06976 8.13073e-02 2.98924e-03 0.00638 ... 0.99960 3.98565e-04 -3.46945e-18 1.19570e-03 9.92826e-01 5.97848e-03 6.37704e-02 5.57991e-03 8.91989e-01 0.03866
2 0.20135 0.51669 0.52676 0.47112 0.04192 0.05665 0.06621 5.19095e-03 5.56174e-04 0.02336 ... 0.33148 6.68521e-01 -1.51788e-18 -1.31839e-16 9.74416e-01 2.55840e-02 4.44939e-03 3.46945e-18 9.42158e-01 0.05339
3 0.11256 0.67167 0.48480 0.48785 0.06362 0.06420 0.08201 1.21753e-02 2.02922e-03 0.00325 ... 0.99919 8.11688e-04 -2.38524e-18 -2.01228e-16 9.97565e-01 2.43506e-03 1.29870e-02 9.74026e-03 9.22890e-01 0.05438
4 0.05861 0.54461 0.51055 0.46866 0.03212 0.04254 0.04764 3.10697e-03 6.65779e-04 0.26498 ... 0.94407 5.59254e-02 -1.30104e-18 -9.71445e-17 7.58988e-02 9.24101e-01 1.33156e-03 6.65779e-03 6.77763e-01 0.31425
5 0.05780 0.52685 0.49029 0.51178 0.03974 0.05150 0.06608 3.68779e-02 9.66702e-03 0.11171 ... 0.96778 2.14823e-02 1.07411e-02 -1.38778e-16 1.11022e-16 1.00000e+00 -2.08167e-17 4.29646e-03 7.99141e-01 0.19656
6 0.10577 0.72604 0.47226 0.47516 0.04897 0.05226 0.06934 8.78594e-03 1.99681e-03 0.00879 ... 0.99920 4.16334e-17 7.98722e-04 -1.94289e-16 9.96006e-01 3.99361e-03 2.07668e-02 7.18850e-03 9.40895e-01 0.03115
7 0.20366 0.41730 0.54424 0.50811 0.04801 0.05218 0.06384 3.98984e-03 8.16104e-04 0.01578 ... 0.99565 1.08814e-03 3.26442e-03 2.06202e-01 7.24701e-01 6.90968e-02 1.68662e-02 5.44070e-04 -8.88178e-16 0.98259
8 0.31114 0.52927 0.48321 0.51008 0.04001 0.05043 0.06867 9.02056e-17 -2.60209e-18 0.00096 ... 0.00768 9.92322e-01 -1.73472e-18 4.53935e-01 3.79079e-01 1.66987e-01 5.95010e-02 1.04083e-17 8.78119e-01 0.06238
9 0.10663 0.51665 0.49770 0.48240 0.07591 0.07713 0.06914 6.26836e-02 9.30460e-03 0.04310 ... 0.99412 5.87659e-03 -2.16840e-18 -1.66533e-16 9.89226e-01 1.07738e-02 1.37120e-02 9.79432e-03 9.31440e-01 0.04505
10 0.15129 0.59494 0.50903 0.49297 0.09687 0.09761 0.07314 1.31271e-02 2.81294e-03 0.02391 ... 0.98312 1.68776e-02 -1.08420e-18 1.40647e-03 9.76090e-01 2.25035e-02 1.04079e-01 7.03235e-03 7.36990e-01 0.15190
11 0.09726 0.65613 0.49830 0.51475 0.06329 0.06745 0.07159 7.96460e-02 1.89633e-02 0.01391 ... 1.00000 1.11022e-16 -1.30104e-18 -1.11022e-16 1.00000e+00 8.32667e-17 3.79267e-03 1.64349e-02 9.02655e-01 0.07712

12 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.12329681817124795
Completeness score:  0.015941302222331637
Homogeneity score:  0.07009803502548105 


----------Cluster Analysis----------

Number of clusters: 13
Size of Cluster 0 =  1459
Size of Cluster 1 =  1563
Size of Cluster 2 =  1230
Size of Cluster 3 =  1239
Size of Cluster 4 =  566
Size of Cluster 5 =  2131
Size of Cluster 6 =  680
Size of Cluster 7 =  909
Size of Cluster 8 =  535
Size of Cluster 9 =  975
Size of Cluster 10 =  883
Size of Cluster 11 =  825
Size of Cluster 12 =  1310

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.27752 0.58259 0.48371 0.49511 0.04014 0.05312 0.06689 0.00023 9.54098e-18 0.00617 ... 6.66134e-16 1.00000e+00 -2.81893e-18 2.24812e-01 0.63331 0.14188 1.17961e-16 1.64799e-17 1.00000 -3.88578e-16
1 0.06909 0.59213 0.49417 0.50766 0.05178 0.05881 0.06943 0.06163 1.50352e-02 0.06270 ... 9.97441e-01 2.55918e-03 -2.81893e-18 -2.35922e-16 0.49904 0.50096 1.91939e-03 1.08765e-02 0.90467 8.25336e-02
2 0.11231 0.67073 0.48577 0.48835 0.06387 0.06429 0.08205 0.01220 2.03252e-03 0.00325 ... 9.99187e-01 8.13008e-04 -2.16840e-18 -2.08167e-16 0.99675 0.00325 1.30081e-02 9.75610e-03 0.92764 4.95935e-02
3 0.10575 0.73043 0.47049 0.47571 0.04863 0.05184 0.06929 0.00834 2.01776e-03 0.00888 ... 9.99193e-01 5.55112e-17 8.07103e-04 -1.94289e-16 0.99516 0.00484 1.61421e-02 7.26392e-03 0.94512 3.14770e-02
4 0.09203 0.43286 0.52711 0.50212 0.04914 0.05353 0.06727 0.04770 4.41696e-03 0.03004 ... 9.98233e-01 1.76678e-03 0.00000e+00 0.00000e+00 0.98940 0.01060 1.76678e-01 8.83392e-03 0.77032 4.41696e-02
5 0.11691 0.64008 0.49449 0.48993 0.06097 0.06056 0.06942 0.08353 2.58095e-03 0.00469 ... 1.00000e+00 -4.44089e-16 -2.81893e-18 -2.22045e-16 0.99437 0.00563 3.23792e-02 4.22337e-03 0.91741 4.59878e-02
6 0.03611 0.54779 0.49587 0.48299 0.02647 0.03776 0.04564 0.00343 -6.93889e-18 0.30735 ... 9.32353e-01 6.76471e-02 -8.67362e-19 -4.85723e-17 0.07647 0.92353 1.47059e-03 7.35294e-03 0.77794 2.13235e-01
7 0.11832 0.62651 0.48841 0.49490 0.08230 0.08424 0.07652 0.15475 8.25083e-03 0.01430 ... 9.96700e-01 3.30033e-03 -1.73472e-18 -1.45717e-16 0.99010 0.00990 1.76018e-02 1.10011e-03 0.92299 5.83058e-02
8 0.21769 0.56729 0.48480 0.48523 0.06343 0.07379 0.06494 0.00436 1.86916e-03 0.02617 ... 9.60748e-01 9.34579e-03 2.99065e-02 2.77556e-17 0.54019 0.45981 1.86916e-03 3.73832e-03 0.00561 9.88785e-01
9 0.21501 0.39487 0.58130 0.49716 0.03949 0.05179 0.06317 0.00479 1.53846e-03 0.00718 ... 9.50769e-01 4.92308e-02 -1.95156e-18 -1.52656e-16 0.99487 0.00513 1.02564e-02 6.07153e-18 0.05026 9.39487e-01
10 0.11946 0.56229 0.50303 0.48543 0.08458 0.08477 0.07114 0.01284 2.83126e-03 0.03171 ... 9.73952e-01 2.60476e-02 -1.51788e-18 1.35900e-02 0.97508 0.01133 8.60702e-02 9.06002e-03 0.80974 9.51302e-02
11 0.21837 0.32000 0.55280 0.51927 0.04705 0.04564 0.06581 0.00283 -6.93889e-18 0.02545 ... 8.46061e-01 1.53939e-01 -1.51788e-18 6.25455e-01 0.26909 0.10545 1.00606e-01 -8.67362e-19 0.01333 8.86061e-01
12 0.13164 0.55534 0.51973 0.49224 0.08655 0.08803 0.06964 0.01196 8.39695e-03 0.02061 ... 9.89313e-01 1.06870e-02 -2.38524e-18 -2.08167e-16 0.99542 0.00458 8.16794e-02 7.63359e-03 0.85115 5.95420e-02

13 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.1303911791253625
Completeness score:  0.017236767740334908
Homogeneity score:  0.07835551204620336 


----------Cluster Analysis----------

Number of clusters: 14
Size of Cluster 0 =  658
Size of Cluster 1 =  1815
Size of Cluster 2 =  1504
Size of Cluster 3 =  1634
Size of Cluster 4 =  543
Size of Cluster 5 =  971
Size of Cluster 6 =  830
Size of Cluster 7 =  786
Size of Cluster 8 =  1465
Size of Cluster 9 =  515
Size of Cluster 10 =  686
Size of Cluster 11 =  1241
Size of Cluster 12 =  893
Size of Cluster 13 =  764

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.03572 0.55167 0.48726 0.48273 0.02679 0.03742 0.04537 0.00152 -6.93889e-18 0.30547 ... 9.31611e-01 6.83891e-02 -8.67362e-19 -3.46945e-17 7.29483e-02 9.27052e-01 1.51976e-03 7.59878e-03 0.78267 0.20821
1 0.11744 0.60303 0.49879 0.48426 0.05885 0.05928 0.06791 0.03067 2.47934e-03 0.00606 ... 1.00000e+00 -3.33067e-16 -2.81893e-18 -2.56739e-16 9.95041e-01 4.95868e-03 7.27273e-02 4.95868e-03 0.87218 0.05014
2 0.28486 0.56383 0.48881 0.49164 0.04014 0.05268 0.06718 0.00022 1.12757e-17 0.00598 ... 6.66134e-16 1.00000e+00 -2.38524e-18 2.59309e-01 6.14362e-01 1.26330e-01 4.12234e-02 1.56125e-17 0.95811 0.00066
3 0.11149 0.64780 0.49134 0.49772 0.06193 0.06326 0.07970 0.09731 2.44798e-03 0.00306 ... 9.98776e-01 1.22399e-03 -2.81893e-18 -2.42861e-16 9.96940e-01 3.05998e-03 2.63158e-02 8.56793e-03 0.91065 0.05447
4 0.25801 0.35543 0.60472 0.50362 0.04742 0.06045 0.06759 0.00859 9.20810e-04 0.00921 ... 9.13444e-01 8.65562e-02 -2.16840e-19 1.38778e-17 9.94475e-01 5.52486e-03 7.73481e-02 1.84162e-03 0.12155 0.79926
5 0.13249 0.51493 0.53141 0.48946 0.05162 0.05584 0.06458 0.00687 3.08960e-03 0.01648 ... 9.98970e-01 1.02987e-03 -1.73472e-18 -1.59595e-16 9.96910e-01 3.08960e-03 1.44181e-02 3.08960e-03 0.46859 0.51390
6 0.04617 0.51988 0.49986 0.50667 0.03946 0.04970 0.06680 0.04096 1.02410e-02 0.12169 ... 9.73494e-01 2.65060e-02 -1.73472e-18 -1.24900e-16 -1.11022e-16 1.00000e+00 -4.16334e-17 6.02410e-03 0.90000 0.09398
7 0.09580 0.65840 0.49753 0.51671 0.06290 0.06711 0.07162 0.08015 1.90840e-02 0.01145 ... 1.00000e+00 1.11022e-16 -1.30104e-18 -1.04083e-16 1.00000e+00 8.32667e-17 3.81679e-03 1.65394e-02 0.90712 0.07252
8 0.11430 0.55222 0.50920 0.49308 0.07816 0.07899 0.06912 0.01138 8.19113e-03 0.02526 ... 9.90444e-01 9.55631e-03 -2.60209e-18 -2.28983e-16 9.94539e-01 5.46075e-03 4.64164e-02 7.50853e-03 0.88532 0.06075
9 0.21477 0.55340 0.49261 0.48291 0.06299 0.07379 0.06444 0.00583 1.94175e-03 0.02718 ... 9.59223e-01 9.70874e-03 3.10680e-02 3.46945e-17 5.26214e-01 4.73786e-01 1.94175e-03 3.88350e-03 0.00388 0.99029
10 0.14875 0.58236 0.51623 0.48542 0.09575 0.09734 0.07294 0.01361 2.91545e-03 0.02187 ... 9.66472e-01 3.35277e-02 -6.50521e-19 1.74927e-02 9.75219e-01 7.28863e-03 1.07872e-01 7.28863e-03 0.76676 0.11808
11 0.10522 0.72804 0.47129 0.47575 0.04865 0.05193 0.06927 0.00860 2.01450e-03 0.00886 ... 9.99194e-01 5.55112e-17 8.05802e-04 -2.01228e-16 9.95971e-01 4.02901e-03 2.01450e-02 7.25222e-03 0.94118 0.03143
12 0.11969 0.63326 0.48536 0.49418 0.08294 0.08505 0.07665 0.15677 7.83875e-03 0.01456 ... 9.96641e-01 3.35946e-03 -1.51788e-18 -1.31839e-16 9.89922e-01 1.00784e-02 1.79171e-02 1.11982e-03 0.92273 0.05823
13 0.19942 0.34620 0.54015 0.52120 0.04851 0.04627 0.06511 0.00305 -7.80626e-18 0.02749 ... 9.14921e-01 8.50785e-02 -1.08420e-18 5.94241e-01 2.90576e-01 1.15183e-01 2.74869e-02 -3.46945e-18 0.01571 0.95681

14 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.13478346832112867
Completeness score:  0.017174913851912877
Homogeneity score:  0.0802602724549679 


----------Cluster Analysis----------

Number of clusters: 15
Size of Cluster 0 =  1303
Size of Cluster 1 =  1455
Size of Cluster 2 =  1257
Size of Cluster 3 =  899
Size of Cluster 4 =  840
Size of Cluster 5 =  968
Size of Cluster 6 =  427
Size of Cluster 7 =  585
Size of Cluster 8 =  1012
Size of Cluster 9 =  892
Size of Cluster 10 =  1175
Size of Cluster 11 =  1226
Size of Cluster 12 =  723
Size of Cluster 13 =  750
Size of Cluster 14 =  793

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.11389 0.63968 0.49454 0.47741 0.06034 0.05982 0.06847 0.03965 2.68611e-03 0.00460 ... 1.00000 -1.38778e-17 -2.16840e-18 -2.22045e-16 9.94628e-01 0.00537 3.60706e-02 2.30238e-03 0.91711 4.45127e-02
1 0.27779 0.58454 0.48284 0.49317 0.03982 0.05321 0.06687 0.00023 8.67362e-18 0.00619 ... 0.00069 9.99313e-01 -2.81893e-18 2.21993e-01 6.35052e-01 0.14296 1.11022e-16 1.47451e-17 1.00000 -3.60822e-16
2 0.12795 0.55609 0.51888 0.49480 0.08448 0.08592 0.06974 0.01246 8.35322e-03 0.02228 ... 0.98886 1.11376e-02 -2.38524e-18 -2.08167e-16 9.94431e-01 0.00557 5.25060e-02 7.95545e-03 0.88544 5.40971e-02
3 0.12506 0.57175 0.49842 0.48610 0.08482 0.08521 0.07115 0.01187 2.78087e-03 0.02892 ... 0.97553 2.44716e-02 -1.73472e-18 1.33482e-02 9.69967e-01 0.01669 8.45384e-02 8.89878e-03 0.78865 1.17909e-01
4 0.06946 0.55179 0.49805 0.48214 0.03274 0.04485 0.04854 0.00317 1.19048e-03 0.24286 ... 0.92976 5.59524e-02 1.42857e-02 -1.11022e-16 6.07143e-02 0.93929 1.19048e-03 5.95238e-03 0.61548 3.77381e-01
5 0.16660 0.41632 0.55459 0.49852 0.03816 0.05083 0.06206 0.00379 2.06612e-03 0.01653 ... 0.95041 4.95868e-02 -2.16840e-18 -1.52656e-16 9.95868e-01 0.00413 1.44628e-02 2.06612e-03 0.18492 7.98554e-01
6 0.11457 0.69321 0.47428 0.53653 0.05899 0.06382 0.07546 0.31850 3.51288e-03 0.00703 ... 0.99766 2.34192e-03 8.67362e-19 4.16334e-17 9.92974e-01 0.00703 4.91803e-02 9.36768e-03 0.87354 6.79157e-02
7 0.23236 0.17949 0.58396 0.49732 0.05043 0.04474 0.06825 0.00171 -6.07153e-18 0.03590 ... 0.77436 2.25641e-01 -4.33681e-19 8.90598e-01 -1.11022e-16 0.10940 1.41880e-01 -6.07153e-18 0.02564 8.32479e-01
8 0.11108 0.56621 0.50737 0.49058 0.05892 0.05998 0.06946 0.02668 3.45850e-03 0.00692 ... 1.00000 1.94289e-16 -1.95156e-18 -1.52656e-16 9.92095e-01 0.00791 9.28854e-02 5.92885e-03 0.87451 2.66798e-02
9 0.07641 0.59978 0.48540 0.51947 0.05928 0.06424 0.07154 0.09380 1.90583e-02 0.07623 ... 0.99888 1.12108e-03 -1.51788e-18 -1.38778e-16 5.15695e-01 0.48430 -2.08167e-17 1.12108e-02 0.87892 1.09865e-01
10 0.10319 0.72340 0.47273 0.47801 0.04803 0.05109 0.06906 0.00567 1.70213e-03 0.00936 ... 0.99915 1.11022e-16 8.51064e-04 -1.87350e-16 9.95745e-01 0.00426 1.95745e-02 7.65957e-03 0.94043 3.23404e-02
11 0.11217 0.67129 0.48541 0.48850 0.06372 0.06434 0.08205 0.01223 2.03915e-03 0.00326 ... 0.99918 8.15661e-04 -2.38524e-18 -2.01228e-16 9.96737e-01 0.00326 1.30506e-02 8.97227e-03 0.92822 4.97553e-02
12 0.05979 0.58368 0.50194 0.49225 0.04305 0.05214 0.06659 0.01890 8.29876e-03 0.06362 ... 0.99447 5.53250e-03 -1.08420e-18 -6.93889e-17 4.57815e-01 0.54219 4.14938e-03 1.10650e-02 0.90180 8.29876e-02
13 0.25141 0.53333 0.52587 0.51524 0.05925 0.06513 0.06607 0.00533 1.33333e-03 0.01200 ... 0.98933 5.33333e-03 5.33333e-03 -8.32667e-17 9.18667e-01 0.08133 5.60000e-02 2.66667e-03 0.00533 9.36000e-01
14 0.12373 0.58701 0.50204 0.49142 0.08504 0.08625 0.07496 0.16982 8.82724e-03 0.01261 ... 0.99622 3.78310e-03 -1.08420e-18 -9.02056e-17 9.87390e-01 0.01261 2.01765e-02 -2.60209e-18 0.89912 8.07062e-02

15 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.12646336219070328
Completeness score:  0.016162323007434958
Homogeneity score:  0.07861447867786978 


----------Cluster Analysis----------

Number of clusters: 16
Size of Cluster 0 =  694
Size of Cluster 1 =  728
Size of Cluster 2 =  706
Size of Cluster 3 =  1035
Size of Cluster 4 =  1227
Size of Cluster 5 =  524
Size of Cluster 6 =  790
Size of Cluster 7 =  746
Size of Cluster 8 =  917
Size of Cluster 9 =  1091
Size of Cluster 10 =  991
Size of Cluster 11 =  850
Size of Cluster 12 =  786
Size of Cluster 13 =  1452
Size of Cluster 14 =  984
Size of Cluster 15 =  784

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.15059 0.59438 0.50903 0.49313 0.09717 0.09802 0.07317 0.01393 2.88184e-03 2.30548e-02 ... 0.98271 1.72911e-02 -6.50521e-19 1.44092e-03 0.97839 0.02017 1.06628e-01 7.20461e-03 0.73487 1.51297e-01
1 0.12247 0.63874 0.49670 0.48864 0.06405 0.06233 0.07102 0.05220 2.74725e-03 4.16334e-17 ... 1.00000 6.93889e-17 -1.08420e-18 -7.63278e-17 0.99588 0.00412 3.43407e-02 6.86813e-03 0.91896 3.98352e-02
2 0.21407 0.54391 0.51746 0.46752 0.04046 0.05482 0.06500 0.00189 7.08215e-04 2.54958e-02 ... 0.21530 7.84703e-01 -1.08420e-18 -6.93889e-17 0.96742 0.03258 -3.46945e-17 -5.20417e-18 0.99858 1.41643e-03
3 0.21701 0.40145 0.56364 0.49771 0.04088 0.05244 0.06302 0.00451 1.44928e-03 6.76329e-03 ... 0.95266 4.73430e-02 -2.16840e-18 -1.66533e-16 0.98841 0.01159 1.44928e-02 8.67362e-18 0.03865 9.46860e-01
4 0.11219 0.67196 0.48516 0.48894 0.06372 0.06430 0.08204 0.01222 2.03749e-03 3.25998e-03 ... 0.99919 8.14996e-04 -2.16840e-18 -1.87350e-16 0.99674 0.00326 1.30399e-02 8.96496e-03 0.92747 5.05297e-02
5 0.08220 0.46374 0.51651 0.50108 0.04747 0.05157 0.06637 0.05153 3.81679e-03 2.29008e-02 ... 0.99809 1.90840e-03 2.16840e-19 2.08167e-17 0.98855 0.01145 1.69847e-01 9.54198e-03 0.76908 5.15267e-02
6 0.12289 0.65190 0.48357 0.50422 0.08323 0.08509 0.07600 0.12321 8.22785e-03 1.39241e-02 ... 0.99620 3.79747e-03 -1.51788e-18 -9.71445e-17 0.98608 0.01392 2.02532e-02 1.26582e-03 0.90000 7.84810e-02
7 0.05836 0.54290 0.51418 0.46756 0.03125 0.04179 0.04761 0.00357 6.70241e-04 2.68097e-01 ... 0.95442 4.55764e-02 -1.08420e-18 -7.63278e-17 0.06971 0.93029 1.34048e-03 6.70241e-03 0.67158 3.20375e-01
8 0.08411 0.57306 0.47882 0.52854 0.06032 0.06810 0.07024 0.05707 1.85387e-02 7.96074e-02 ... 0.98691 2.18103e-03 1.09051e-02 -1.45717e-16 0.40131 0.59869 3.27154e-03 1.19956e-02 0.78299 2.01745e-01
9 0.11135 0.73831 0.47266 0.47385 0.04921 0.05270 0.06980 0.00764 1.83318e-03 5.49954e-03 ... 0.99908 1.80411e-16 9.16590e-04 -1.80411e-16 0.99633 0.00367 1.74152e-02 8.24931e-03 0.94042 3.39138e-02
10 0.30327 0.56206 0.47671 0.51322 0.04005 0.05111 0.06821 0.00034 -5.20417e-18 2.77556e-17 ... 0.00101 9.98991e-01 -1.73472e-18 4.04642e-01 0.39758 0.19778 6.25631e-02 7.80626e-18 0.93744 -5.55112e-17
11 0.20038 0.37118 0.53652 0.51808 0.05000 0.04826 0.06558 0.00196 -6.93889e-18 3.05882e-02 ... 0.91882 7.64706e-02 4.70588e-03 5.34118e-01 0.32941 0.13647 2.47059e-02 2.60209e-18 0.01176 9.63529e-01
12 0.10249 0.49746 0.50355 0.48227 0.07196 0.07223 0.06866 0.06658 1.01781e-02 4.58015e-02 ... 0.99746 2.54453e-03 -1.51788e-18 -9.71445e-17 0.99109 0.00891 1.01781e-02 1.14504e-02 0.94148 3.68957e-02
13 0.11444 0.63877 0.49244 0.48854 0.05962 0.05997 0.06867 0.09780 2.75482e-03 5.50964e-03 ... 1.00000 -1.52656e-16 -2.38524e-18 -2.35922e-16 0.99518 0.00482 3.78788e-02 2.75482e-03 0.90840 5.09642e-02
14 0.14320 0.57215 0.52300 0.48804 0.08543 0.08829 0.07039 0.00881 4.57317e-03 8.13008e-03 ... 0.98374 1.42276e-02 2.03252e-03 -1.73472e-16 0.99390 0.00610 9.95935e-02 6.09756e-03 0.71138 1.82927e-01
15 0.06594 0.61671 0.50417 0.49307 0.04177 0.04984 0.06727 0.05740 8.92857e-03 5.48469e-02 ... 0.99490 5.10204e-03 -1.30104e-18 -1.04083e-16 0.54464 0.45536 -3.46945e-17 8.92857e-03 0.91582 7.52551e-02

16 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.1306310592812013
Completeness score:  0.015053667552734037
Homogeneity score:  0.07536526400049233 


----------Cluster Analysis----------

Number of clusters: 17
Size of Cluster 0 =  1225
Size of Cluster 1 =  1029
Size of Cluster 2 =  1204
Size of Cluster 3 =  864
Size of Cluster 4 =  790
Size of Cluster 5 =  1223
Size of Cluster 6 =  592
Size of Cluster 7 =  563
Size of Cluster 8 =  946
Size of Cluster 9 =  815
Size of Cluster 10 =  498
Size of Cluster 11 =  561
Size of Cluster 12 =  397
Size of Cluster 13 =  644
Size of Cluster 14 =  784
Size of Cluster 15 =  694
Size of Cluster 16 =  1476

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.11241 0.67143 0.48542 0.48838 0.06383 0.06435 0.08212 1.22449e-02 2.04082e-03 3.26531e-03 ... 9.99184e-01 8.16327e-04 -2.38524e-18 -2.01228e-16 9.98367e-01 1.63265e-03 1.30612e-02 9.79592e-03 9.27347e-01 0.04980
1 0.12212 0.60253 0.51258 0.49683 0.08181 0.08224 0.07245 8.68157e-02 7.28863e-03 9.71817e-03 ... 9.88338e-01 1.16618e-02 -1.95156e-18 -1.66533e-16 9.94169e-01 5.83090e-03 5.83090e-02 3.88727e-03 8.41594e-01 0.09621
2 0.10356 0.72508 0.46936 0.47536 0.04822 0.05158 0.06915 6.64452e-03 1.66113e-03 9.96678e-03 ... 9.99169e-01 8.32667e-17 8.30565e-04 -1.87350e-16 9.95847e-01 4.15282e-03 2.15947e-02 7.47508e-03 9.38538e-01 0.03239
3 0.18102 0.50463 0.53194 0.50278 0.04514 0.05405 0.06044 5.40123e-03 1.15741e-03 1.04167e-02 ... 9.94213e-01 2.31481e-03 3.47222e-03 -1.17961e-16 9.93056e-01 6.94444e-03 1.27315e-02 3.47222e-03 6.66134e-16 0.98380
4 0.04135 0.53101 0.48909 0.49549 0.04066 0.05028 0.06709 4.30380e-02 1.07595e-02 1.26582e-01 ... 9.94937e-01 5.06329e-03 -1.51788e-18 -1.04083e-16 0.00000e+00 1.00000e+00 -3.46945e-17 5.06329e-03 8.98734e-01 0.09620
5 0.10962 0.55724 0.51032 0.49736 0.05821 0.05881 0.06965 5.04225e-02 3.67948e-03 1.22649e-02 ... 1.00000e+00 5.55112e-17 -2.38524e-18 -2.01228e-16 9.95094e-01 4.90597e-03 8.83074e-02 7.35895e-03 8.86345e-01 0.01799
6 0.37484 0.47044 0.55824 0.49437 0.03948 0.04949 0.06907 2.77556e-17 -6.07153e-18 4.16334e-17 ... 1.68919e-03 9.98311e-01 -4.33681e-19 7.11149e-01 0.00000e+00 2.88851e-01 1.04730e-01 -5.20417e-18 8.61486e-01 0.03378
7 0.13072 0.64387 0.47879 0.49964 0.08603 0.08912 0.07519 1.42096e-02 4.44050e-03 1.42096e-02 ... 9.94671e-01 5.32860e-03 -2.16840e-19 2.08167e-17 9.82238e-01 1.77620e-02 2.84192e-02 1.77620e-03 8.89876e-01 0.07993
8 0.23330 0.61099 0.45253 0.48407 0.03898 0.05507 0.06581 3.52361e-04 -5.20417e-18 7.39958e-03 ... 2.10942e-15 1.00000e+00 -1.73472e-18 -1.45717e-16 9.75687e-01 2.43129e-02 -1.38778e-17 5.20417e-18 9.98943e-01 0.00106
9 0.10202 0.50552 0.50297 0.48630 0.07224 0.07316 0.06885 7.48466e-02 1.04294e-02 4.53988e-02 ... 9.97546e-01 2.45399e-03 -1.30104e-18 -1.17961e-16 9.92638e-01 7.36196e-03 9.81595e-03 9.81595e-03 9.48466e-01 0.03190
10 0.19729 0.21084 0.56271 0.49772 0.05422 0.04558 0.06723 2.00803e-03 -5.20417e-18 4.21687e-02 ... 9.09639e-01 9.03614e-02 2.16840e-19 8.71486e-01 1.11022e-16 1.28514e-01 4.21687e-02 -4.33681e-18 2.00803e-02 0.93775
11 0.10830 0.54100 0.50727 0.50969 0.03699 0.05495 0.05348 3.56506e-03 1.78253e-03 1.35472e-01 ... 8.98396e-01 7.84314e-02 2.31729e-02 2.08167e-17 6.95187e-02 9.30481e-01 1.78253e-03 5.34759e-03 4.90196e-01 0.50267
12 0.03630 0.54408 0.50354 0.48010 0.03212 0.03520 0.04618 1.67926e-03 -2.60209e-18 3.29975e-01 ... 9.47103e-01 5.28967e-02 8.67362e-19 4.85723e-17 7.30479e-02 9.26952e-01 -1.38778e-17 7.55668e-03 6.87657e-01 0.30479
13 0.25697 0.39519 0.56450 0.50342 0.05008 0.06262 0.06723 6.21118e-03 1.55280e-03 7.76398e-03 ... 9.25466e-01 7.45342e-02 -6.50521e-19 -2.77556e-17 9.89130e-01 1.08696e-02 6.52174e-02 1.55280e-03 8.69565e-02 0.84627
14 0.09666 0.65816 0.49715 0.51488 0.06322 0.06738 0.07171 8.03571e-02 1.91327e-02 1.14796e-02 ... 1.00000e+00 1.24900e-16 -1.30104e-18 -9.02056e-17 1.00000e+00 1.11022e-16 3.82653e-03 1.65816e-02 9.05612e-01 0.07398
15 0.15083 0.59294 0.51042 0.49207 0.09753 0.09820 0.07333 1.39289e-02 2.88184e-03 2.30548e-02 ... 9.82709e-01 1.72911e-02 -1.08420e-18 1.44092e-03 9.81268e-01 1.72911e-02 1.06628e-01 7.20461e-03 7.37752e-01 0.14841
16 0.11495 0.63889 0.49554 0.48828 0.05954 0.05977 0.06878 9.75610e-02 2.71003e-03 5.42005e-03 ... 1.00000e+00 -1.38778e-16 -2.60209e-18 -2.35922e-16 9.92547e-01 7.45257e-03 3.65854e-02 2.03252e-03 9.11924e-01 0.04946

17 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.13270046750638992
Completeness score:  0.016553380183092447
Homogeneity score:  0.0837941523176528 


----------Cluster Analysis----------

Number of clusters: 18
Size of Cluster 0 =  313
Size of Cluster 1 =  975
Size of Cluster 2 =  862
Size of Cluster 3 =  770
Size of Cluster 4 =  1190
Size of Cluster 5 =  1224
Size of Cluster 6 =  552
Size of Cluster 7 =  807
Size of Cluster 8 =  969
Size of Cluster 9 =  725
Size of Cluster 10 =  336
Size of Cluster 11 =  897
Size of Cluster 12 =  499
Size of Cluster 13 =  1127
Size of Cluster 14 =  859
Size of Cluster 15 =  1018
Size of Cluster 16 =  536
Size of Cluster 17 =  646

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.15755 0.61502 0.52372 0.46454 0.10803 0.10359 0.07443 1.04366e-01 4.79233e-03 1.91693e-02 ... 0.97444 2.55591e-02 8.67362e-19 3.46945e-17 0.99361 0.00639 8.30671e-02 9.58466e-03 7.47604e-01 1.59744e-01
1 0.12374 0.53590 0.51438 0.48424 0.08417 0.08213 0.06974 1.19658e-02 5.12821e-03 3.38462e-02 ... 1.00000 2.08167e-16 -1.73472e-18 -1.59595e-16 0.99795 0.00205 1.23077e-01 9.23077e-03 8.55385e-01 1.23077e-02
2 0.07572 0.60267 0.48769 0.52146 0.05873 0.06383 0.07154 9.55143e-02 1.97216e-02 7.42459e-02 ... 1.00000 1.52656e-16 -1.30104e-18 -1.24900e-16 0.52668 0.47332 -2.77556e-17 1.16009e-02 9.04872e-01 8.35267e-02
3 0.19829 0.35325 0.53866 0.52130 0.04821 0.04601 0.06503 1.73160e-03 -6.93889e-18 2.72727e-02 ... 0.91558 8.44156e-02 -1.30104e-18 5.89610e-01 0.30260 0.10779 2.72727e-02 -2.60209e-18 1.29870e-02 9.59740e-01
4 0.10453 0.72353 0.47519 0.47686 0.04827 0.05141 0.06909 6.72269e-03 1.68067e-03 9.24370e-03 ... 0.99916 8.32667e-17 8.40336e-04 -1.94289e-16 0.99580 0.00420 1.84874e-02 7.56303e-03 9.41176e-01 3.27731e-02
5 0.11242 0.67075 0.48647 0.48813 0.06367 0.06440 0.08206 1.22549e-02 2.04248e-03 3.26797e-03 ... 0.99918 8.16993e-04 -2.16840e-18 -2.01228e-16 0.99673 0.00327 1.30719e-02 8.98693e-03 9.27288e-01 5.06536e-02
6 0.24658 0.55435 0.52094 0.45556 0.03770 0.05611 0.06421 2.08167e-17 -6.07153e-18 1.26812e-02 ... 0.00000 1.00000e+00 -2.16840e-19 1.38778e-17 0.95833 0.04167 -2.08167e-17 -5.20417e-18 9.98188e-01 1.81159e-03
7 0.11964 0.58736 0.49190 0.49703 0.07977 0.08138 0.07490 1.35894e-01 4.95663e-03 1.61090e-02 ... 0.99504 4.95663e-03 -1.30104e-18 1.23916e-03 0.99380 0.00496 6.31970e-02 4.95663e-03 8.27757e-01 1.04089e-01
8 0.10784 0.57585 0.51016 0.49966 0.06689 0.07193 0.06888 1.10079e-02 1.03199e-02 2.27038e-02 ... 0.99794 2.06398e-03 -1.51788e-18 -1.52656e-16 0.99174 0.00826 0.00000e+00 5.15996e-03 9.16409e-01 7.84314e-02
9 0.06011 0.58414 0.50342 0.49191 0.04310 0.05207 0.06662 1.88506e-02 8.27586e-03 6.34483e-02 ... 0.99448 5.51724e-03 -1.08420e-18 -6.24500e-17 0.45793 0.54207 4.13793e-03 1.10345e-02 9.04828e-01 8.00000e-02
10 0.11837 0.61310 0.49954 0.54355 0.06101 0.06391 0.07187 3.95833e-01 4.46429e-03 5.95238e-03 ... 0.99702 2.97619e-03 8.67362e-19 4.16334e-17 0.99405 0.00595 6.25000e-02 8.92857e-03 8.48214e-01 8.03571e-02
11 0.22079 0.38963 0.58458 0.50710 0.03874 0.05028 0.06264 2.60126e-03 1.11483e-03 4.45931e-03 ... 0.94760 5.23969e-02 -1.73472e-18 -1.24900e-16 0.99443 0.00557 1.11483e-03 3.46945e-18 8.88178e-16 9.98885e-01
12 0.21198 0.54910 0.49491 0.48029 0.06212 0.07320 0.06443 6.01202e-03 3.00601e-03 2.60521e-02 ... 0.95792 1.00200e-02 3.20641e-02 3.46945e-17 0.50701 0.49299 2.00401e-03 2.00401e-03 1.00200e-02 9.85972e-01
13 0.11067 0.76797 0.30894 0.48571 0.05784 0.05732 0.06822 3.93375e-02 3.10559e-03 5.32387e-03 ... 1.00000 1.38778e-16 -2.16840e-18 -1.80411e-16 0.99645 0.00355 4.16334e-17 1.77462e-03 9.39663e-01 5.85626e-02
14 0.12075 0.37776 0.74557 0.47761 0.05952 0.06153 0.06868 3.41482e-02 2.32829e-03 5.82072e-03 ... 1.00000 1.66533e-16 -1.51788e-18 -1.17961e-16 0.98952 0.01048 1.64144e-01 8.14901e-03 7.84633e-01 4.30733e-02
15 0.30347 0.54666 0.48400 0.51081 0.04009 0.05169 0.06807 9.02056e-17 -3.46945e-18 2.08167e-17 ... 0.00098 9.99018e-01 -1.95156e-18 3.93910e-01 0.39784 0.20825 6.09037e-02 8.67362e-18 9.39096e-01 -5.55112e-17
16 0.12982 0.64646 0.47991 0.49596 0.08722 0.09063 0.07559 1.43035e-02 4.66418e-03 1.49254e-02 ... 0.99440 5.59701e-03 0.00000e+00 2.77556e-17 0.98694 0.01306 2.98507e-02 1.86567e-03 9.06716e-01 6.15672e-02
17 0.02842 0.55805 0.48905 0.48179 0.02757 0.03611 0.04541 2.57998e-03 -6.93889e-18 3.14241e-01 ... 0.96594 3.40557e-02 -6.50521e-19 -2.77556e-17 0.08359 0.91641 1.54799e-03 7.73994e-03 7.67802e-01 2.22910e-01

18 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.12765320156130508
Completeness score:  0.016582252492294492
Homogeneity score:  0.08568346089103383 


----------Cluster Analysis----------

Number of clusters: 19
Size of Cluster 0 =  1056
Size of Cluster 1 =  890
Size of Cluster 2 =  449
Size of Cluster 3 =  1243
Size of Cluster 4 =  666
Size of Cluster 5 =  875
Size of Cluster 6 =  727
Size of Cluster 7 =  677
Size of Cluster 8 =  551
Size of Cluster 9 =  691
Size of Cluster 10 =  252
Size of Cluster 11 =  499
Size of Cluster 12 =  1970
Size of Cluster 13 =  422
Size of Cluster 14 =  470
Size of Cluster 15 =  665
Size of Cluster 16 =  1184
Size of Cluster 17 =  451
Size of Cluster 18 =  567

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.30861 0.52746 0.48632 0.51158 0.03954 0.05047 0.06843 0.00032 -1.73472e-18 6.93889e-18 ... 0.00095 9.99053e-01 -1.95156e-18 4.41288e-01 3.73106e-01 1.85606e-01 5.87121e-02 9.54098e-18 0.87973 0.06155
1 0.22142 0.38933 0.58364 0.50176 0.03933 0.05171 0.06334 0.00337 1.68539e-03 8.98876e-03 ... 0.94719 5.28090e-02 -1.95156e-18 -1.31839e-16 9.95506e-01 4.49438e-03 7.86517e-03 1.12360e-03 0.04045 0.95056
2 0.04860 0.53341 0.49336 0.52116 0.04329 0.05217 0.06789 0.06088 1.22494e-02 1.49220e-01 ... 1.00000 -6.93889e-17 4.33681e-19 4.85723e-17 1.11022e-16 1.00000e+00 -1.38778e-17 2.22717e-03 0.88641 0.11136
3 0.10523 0.72888 0.47121 0.47664 0.04862 0.05187 0.06925 0.00885 2.01126e-03 8.84956e-03 ... 0.99920 4.16334e-17 8.04505e-04 -2.01228e-16 9.95977e-01 4.02253e-03 1.85036e-02 7.24055e-03 0.94288 0.03138
4 0.14162 0.57658 0.51698 0.49404 0.09769 0.09779 0.07352 0.01401 3.00300e-03 2.25225e-02 ... 0.98198 1.80180e-02 -8.67362e-19 1.50150e-03 9.90991e-01 7.50751e-03 1.11111e-01 7.50751e-03 0.77628 0.10511
5 0.13193 0.58114 0.51811 0.48968 0.08550 0.08657 0.07086 0.00952 5.14286e-03 8.00000e-03 ... 0.98629 1.37143e-02 -1.95156e-18 -1.31839e-16 9.93143e-01 6.85714e-03 1.10857e-01 6.85714e-03 0.78857 0.09371
6 0.10221 0.50825 0.50069 0.49033 0.07325 0.07442 0.06766 0.03668 1.16919e-02 4.67675e-02 ... 0.99725 2.75103e-03 -1.08420e-18 -6.93889e-17 9.90371e-01 9.62861e-03 1.65062e-02 1.10041e-02 0.92985 0.04264
7 0.22067 0.54579 0.51917 0.46972 0.03997 0.05535 0.06494 0.00148 -6.93889e-18 1.92024e-02 ... 0.18464 8.15362e-01 -1.08420e-18 -5.55112e-17 9.66027e-01 3.39734e-02 -3.46945e-17 -5.20417e-18 0.99852 0.00148
8 0.12914 0.64428 0.47690 0.49800 0.08621 0.09020 0.07549 0.01452 4.53721e-03 1.45191e-02 ... 0.99456 5.44465e-03 0.00000e+00 2.08167e-17 9.87296e-01 1.27042e-02 2.90381e-02 1.81488e-03 0.90926 0.05989
9 0.18352 0.38495 0.53192 0.52793 0.05056 0.04682 0.06464 0.00193 -6.93889e-18 3.03907e-02 ... 1.00000 6.93889e-17 -8.67362e-19 5.62952e-01 3.40087e-01 9.69609e-02 3.03907e-02 1.44718e-03 0.01592 0.95224
10 0.08070 0.45437 0.53571 0.48810 0.03472 0.04812 0.05306 0.00794 8.67362e-19 5.55556e-02 ... 0.92857 7.14286e-02 4.33681e-19 4.16334e-17 1.66667e-01 8.33333e-01 3.96825e-03 7.93651e-03 0.28571 0.70238
11 0.22331 0.56713 0.48520 0.48444 0.06300 0.07350 0.06517 0.00401 3.00601e-03 2.40481e-02 ... 0.95792 1.00200e-02 3.20641e-02 3.46945e-17 5.39078e-01 4.60922e-01 2.00401e-03 2.00401e-03 0.00601 0.98998
12 0.11542 0.60355 0.50026 0.48667 0.05933 0.05945 0.06821 0.03283 2.79188e-03 6.09137e-03 ... 1.00000 -4.02456e-16 -2.81893e-18 -2.70617e-16 9.95431e-01 4.56853e-03 6.64975e-02 4.06091e-03 0.88528 0.04416
13 0.09300 0.48341 0.54785 0.45340 0.05895 0.06049 0.07074 0.03476 3.55450e-03 1.18483e-02 ... 1.00000 -8.32667e-17 8.67362e-19 3.46945e-17 9.92891e-01 7.10900e-03 4.26540e-02 2.36967e-02 0.78436 0.14929
14 0.09663 0.64681 0.49484 0.51553 0.06955 0.07261 0.07439 0.12057 2.44681e-02 1.70213e-02 ... 1.00000 -8.32667e-17 6.50521e-19 3.46945e-17 1.00000e+00 -1.66533e-16 -2.08167e-17 1.91489e-02 0.91702 0.06383
15 0.06312 0.59398 0.49569 0.49719 0.04492 0.05297 0.06695 0.01805 9.02256e-03 4.66165e-02 ... 0.99398 6.01504e-03 -8.67362e-19 -4.85723e-17 4.69173e-01 5.30827e-01 4.51128e-03 1.20301e-02 0.90075 0.08271
16 0.11315 0.69172 0.47720 0.49012 0.06361 0.06375 0.08246 0.01182 2.11149e-03 3.37838e-03 ... 0.99916 8.44595e-04 -2.38524e-18 -1.94289e-16 9.97466e-01 2.53378e-03 1.35135e-02 1.68919e-03 0.93412 0.05068
17 0.01334 0.59091 0.47250 0.46800 0.02370 0.03243 0.04221 0.00148 -4.33681e-18 4.27938e-01 ... 0.95787 4.21286e-02 8.67362e-19 4.16334e-17 4.43459e-02 9.55654e-01 -2.08167e-17 6.65188e-03 0.98448 0.00887
18 0.11776 0.62875 0.49593 0.52046 0.06779 0.06966 0.07466 0.42916 7.93651e-03 8.81834e-03 ... 0.99824 1.76367e-03 0.00000e+00 1.38778e-17 9.98236e-01 1.76367e-03 3.52734e-02 5.29101e-03 0.88360 0.07584

19 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.14235909452362555
Completeness score:  0.016165968848387748
Homogeneity score:  0.08353101411455131 

In [118]:
fig, ax = plt.subplots()
fig.set_size_inches(5, 5)
ax.plot(k_range, silhouettes_mean, label = "silhouette mean")
ax.plot(k_range, completeness, label="completeness")
ax.plot(k_range, homogeneity, label = "homogeneity")
plt.xlabel("K")
plt.legend(loc='lower right')
plt.show(); 

4.2 Qualitative Analysis of Clusters¶

In [119]:
# Best K: 

# Based on the above scores plot, k=5, k=7, and k=2 have relatively higher scores (Sil.mean, completeness and homogeneity)
# compared to others; However, we'll look at centroids for k=2 to see if we can detect interesting patterns:
In [120]:
sil,comp,hmg,clust,cent=cluster_analysis(X_norm,X_ssf.columns,y,plot_silhouettes,k=2)
----------Cluster Analysis----------

Number of clusters: 2
Size of Cluster 0 =  6104
Size of Cluster 1 =  8201

Cluster centroids:
lead_time arrival_date_year arrival_date_week_number arrival_date_day_of_month stays_in_weekend_nights stays_in_week_nights adults children babies is_repeated_guest ... deposit_type_No Deposit deposit_type_Non Refund deposit_type_Refundable agent_1 agent_listed_other agent_unknown customer_type_Contract customer_type_Group customer_type_Transient customer_type_Transient-Party
0 0.17278 0.48648 0.52169 0.49641 0.04530 0.05394 0.06369 0.01092 0.00377 0.06193 ... 0.71494 0.28244 0.00262 0.14007 0.55013 0.30980 0.03539 0.00508 0.56062 0.39892
1 0.11522 0.64181 0.48953 0.49122 0.06727 0.06863 0.07260 0.05361 0.00451 0.01097 ... 0.99902 0.00085 0.00012 0.00012 0.98549 0.01439 0.03487 0.00573 0.90416 0.05524

2 rows × 82 columns

Silhouette plot:

Mean Silhouette Value:  0.120709553994198
Completeness score:  0.020741610562666327
Homogeneity score:  0.02583576722378519 

In [121]:
# Analysis of Cluster centroids 0 and 1:

cent.T   # centroid 0: shows characteristics of profit; Centroid 1: loss
Out[121]:
0 1
lead_time 0.17278 0.11522
arrival_date_year 0.48648 0.64181
arrival_date_week_number 0.52169 0.48953
arrival_date_day_of_month 0.49641 0.49122
stays_in_weekend_nights 0.04530 0.06727
stays_in_week_nights 0.05394 0.06863
adults 0.06369 0.07260
children 0.01092 0.05361
babies 0.00377 0.00451
is_repeated_guest 0.06193 0.01097
previous_cancellations 0.00641 0.00040
previous_bookings_not_canceled 0.00439 0.00028
booking_changes 0.01431 0.01227
days_in_waiting_list 0.01229 0.00007
adr 0.16969 0.22445
required_car_parking_spaces 0.00635 0.00886
total_of_special_requests 0.04980 0.15986
hotel_City Hotel 0.69954 0.63419
hotel_Resort Hotel 0.30046 0.36581
arrival_date_month_April 0.08011 0.09474
arrival_date_month_August 0.08322 0.13730
arrival_date_month_December 0.05865 0.05463
arrival_date_month_February 0.07225 0.06938
arrival_date_month_January 0.04931 0.05036
arrival_date_month_July 0.08077 0.12498
arrival_date_month_June 0.08748 0.09609
arrival_date_month_March 0.07946 0.08255
arrival_date_month_May 0.09469 0.10230
arrival_date_month_November 0.06291 0.05158
arrival_date_month_October 0.12336 0.06987
arrival_date_month_September 0.12779 0.06621
meal_BB 0.79800 0.75503
meal_FB 0.01212 0.00171
meal_HB 0.16268 0.09511
meal_SC 0.02720 0.14815
country_DEU 0.05292 0.06438
country_FRA 0.05374 0.11413
country_PRT 0.71298 0.17327
country_listed_other 0.17317 0.64663
country_unknown 0.00721 0.00159
market_segment_Aviation 0.00344 0.00098
market_segment_Complementary 0.01245 0.00122
market_segment_Corporate 0.10174 0.00159
market_segment_Direct 0.16497 0.06231
market_segment_Groups 0.37385 0.00768
market_segment_Offline TA/TO 0.32208 0.11243
market_segment_Online TA 0.02146 0.81380
distribution_channel_Corporate 0.13041 0.00268
distribution_channel_Direct 0.20085 0.06414
distribution_channel_GDS 0.00082 0.00219
distribution_channel_TA/TO 0.66792 0.93098
distribution_channel_Undefined 0.00000 0.00000
reserved_room_type_A 0.90924 0.57469
reserved_room_type_B 0.00459 0.01256
reserved_room_type_C 0.00737 0.00988
reserved_room_type_D 0.03096 0.25948
reserved_room_type_E 0.02801 0.07426
reserved_room_type_F 0.01016 0.03597
reserved_room_type_G 0.00721 0.02622
reserved_room_type_H 0.00246 0.00695
reserved_room_type_L 0.00000 0.00000
assigned_room_type_A 0.78866 0.48957
assigned_room_type_B 0.01638 0.01853
assigned_room_type_C 0.02179 0.02170
assigned_room_type_D 0.09731 0.30021
assigned_room_type_E 0.04178 0.08365
assigned_room_type_F 0.01622 0.04170
assigned_room_type_G 0.00967 0.03158
assigned_room_type_H 0.00328 0.00817
assigned_room_type_I 0.00360 0.00280
assigned_room_type_K 0.00131 0.00207
assigned_room_type_L 0.00000 0.00000
deposit_type_No Deposit 0.71494 0.99902
deposit_type_Non Refund 0.28244 0.00085
deposit_type_Refundable 0.00262 0.00012
agent_1 0.14007 0.00012
agent_listed_other 0.55013 0.98549
agent_unknown 0.30980 0.01439
customer_type_Contract 0.03539 0.03487
customer_type_Group 0.00508 0.00573
customer_type_Transient 0.56062 0.90416
customer_type_Transient-Party 0.39892 0.05524
In [122]:
2.14746e-03
Out[122]:
0.00214746

Detecting patterns among cluster centroids:¶

centroid 0: has characteristics of a profit (Revenue=1) class
Centroid 1: has characteristics of revenue loss (Revenue=0) class

We can't conclude that the above finding holds for the majority of the data because the homogeneity and completeness scores are not high.


centroid 1: 

higher lead-time: It means reservations are made well ahead of the arrival date on average in this cluster, and thus they are more probable to get canceled (because guests have more time to find better rates elsewhere) and result in revenue loss.
lower stays_in_weekend_nights: Meaning those who were unable to book for the weekend are more likely to cancel their reservation and this would result in revenue loss for the hotel.
lower number of children: Those with less or no children are more likely to cancel and thus more likely to result in revenue loss for the hotels.
higher previous_cancellations: Meaning that those guests on average canceled past reservations frequently so they are highly probably to do it again and it results in revenue loss for the hotel.
higher days_in_waiting_list: Meaning that those who were on waitlist for a long time were highly probable to find offers elsewhere and call to request to be removed from waitlist, resulting in a revenue loss for the hotel.
lower number of total_of_special_requests: Meaning that guests who had little to no special requests were more likely to cancel because little to no special accomodations were available to them thus such reservations would result in profit for the hotels.
higher agent_1: Meaning that a lot of guests booked through this agent. From the exploratory data analysis steps, we know that majority of reservations booked through this agent get canceled, and so will the reservations in this cluster (revenue loss case).

centroid 0:  

lower lead-time: It means reservations are made closer to the arrival date on average, and thus they are less probable to get canceled (because guests don't have much time to find better rates elsewhere) and result in a profit.
higher stays_in_weekend_nights: Meaning those who stay over the weekend are less likely to cancel their reservation and this would result in revenue for the hotel.
higher number of children: Those with more children are less likely to cancel and thus more likely to result in profit for the hotels
lower previous_cancellations: Meaning that those guests rarely canceled any reservations in the past so they are highly probably to stay thus more likely to result in profit.
lower days_in_waiting_list: Meaning that those who did not go on waitlist or were on waitlist for a short time were highly probable to come and stay at the hotel, resulting in profit for the hotel.
higher number of total_of_special_requests: Meaning that guests who had higher number of special requests were more likely to come and stay because their many requests would be fulfilled, thus such reservations would result in profit for the hotels.
low agent_1: Meaning that very few guests booked through this agent. From the exploratory data analysis steps, we know that majority of reservations booked through this agent get canceled, and since these guests did not book through this agent, they are more likely to stay and thus that translates to revenue.

5 - Comparison and Conclusion¶

Below are the Overall Accuracies for each classifier:

KNN : 0.80(+/- 0.01)¶

Decision Trees : 0.82 (+/- 0.02)¶

Random Forest : : 0.85 (+/- 0.02)¶

SVM: 0.79 (+/- 0.01)¶

LDA: 0.78 (+/- 0.02)¶

Naïve Bayes: 0.53 (+/- 0.03)¶

Based on our analysis of each classifier, we conclude that Random Forest, being an ensemble model, outperformed the simple classifiers with an overall accuracy of 85%. Furthermore, the random forest is useful in handling larger datasets and can provide a higher level of accuracy.

In addition, the results, and observations from the unsupervised knowledge discovery (clustering) helped us in pattern recognition in the sense that the findings agreed with the information we had previously obtained from exploratory data analysis and visualizations.

In [ ]: